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Chapter 1 


A Geometrical Theory of 
Spacetime 


“T always get a slight brain-shiver, now [that] space and time appear 
conglomerated together in a gray, miserable chaos.” — Sommerfeld 


This is a book about general relativity, at a level that is meant 
to be accessible to advanced undergraduates. 


This is mainly a book about general relativity, not special rel- 
ativity. I’ve heard the sentiment expressed that books on special 
relativity generally do a lousy job on special relativity, compared to 
books on general relativity. This is undoubtedly true, for someone 
who already has already learned special relativity — but wants to 
unlearn the parts that are completely wrong in the broader context 
of general relativity. For someone who has not already learned spe- 
cial relativity, I strongly recommend mastering it first, from a book 
such as Taylor and Wheeler’s Spacetime Physics. 


In the back of this book I’ve included excerpts from three papers 
by Einstein — two on special relativity and one on general relativity. 
They can be read before, after, or along with this book. There are 
footnotes in the papers and in the main text linking their content 
with each other. 


I should reveal at the outset that I am not a professional rela- 
tivist. My field of research was nonrelativistic nuclear physics until 
I became a community college physics instructor. I can only hope 
that my pedagogical experience will compensate to some extent for 
my shallow background, and that readers who find mistakes will be 
kind enough to let me know about them using the contact informa- 
tion provided at http://www.lightandmatter.com/area4author. 
html. 


11 


12 


Chapter 1 


1.1 Time and causality 


Updating Plato’s allegory of the cave, imagine two super-intelligent 
twins, Alice and Betty. They’re raised entirely by a robotic tutor 
on a sealed space station, with no access to the outside world. The 
robot, in accord with the latest fad in education, is programmed to 
encourage them to build up a picture of all the laws of physics based 
on their own experiments, without a textbook to tell them the right 
answers. Putting yourself in the twins’ shoes, imagine giving up 
all your preconceived ideas about space and time, which may turn 
out according to relativity to be completely wrong, or perhaps only 
approximations that are valid under certain circumstances. 


Causality is one thing the twins will notice. Certain events re- 
sult in other events, forming a network of cause and effect. One 
general rule they infer from their observations is that there is an 
unambiguously defined notion of betweenness: if Alice observes that 
event 1 causes event 2, and then 2 causes 3, Betty always agrees that 
2 lies between 1 and 3 in the chain of causality. They find that this 
agreement holds regardless of whether one twin is standing on her 
head (i.e., it’s invariant under rotation), and regardless of whether 
one twin is sitting on the couch while the other is zooming around 
the living room in circles on her nuclear fusion scooter (i.e., it’s also 
invariant with respect to different states of motion). 


You may have heard that relativity is a theory that can be inter- 
preted using non-Euclidean geometry. The invariance of between- 
ness is a basic geometrical property that is shared by both Euclidean 
and non-Euclidean geometry. We say that they are both ordered 
geometries. With this geometrical interpretation in mind, it will 
be useful to think of events not as actual notable occurrences but 
merely as an ambient sprinkling of points at which things could hap- 
pen. For example, if Alice and Betty are eating dinner, Alice could 
choose to throw her mashed potatoes at Betty. Even if she refrains, 
there was the potential for a causal linkage between her dinner and 
Betty’s forehead. 


Betweenness is very weak. Alice and Betty may also make a 
number of conjectures that would say much more about causality. 
For example: (i) that the universe’s entire network of causality is 
connected, rather than being broken up into separate parts; (ii) that 
the events are globally ordered, so that for any two events 1 and 2, 
either 1 could cause 2 or 2 could cause 1, but not both; (iii) not only 
are the events ordered, but the ordering can be modeled by sorting 
the events out along a line, the time axis, and assigning a number f, 
time, to each event. To see what these conjectures would entail, let’s 
discuss a few examples that may draw on knowledge from outside 
Alice and Betty’s experiences. 


Example: According to the Big Bang theory, it seems likely that 
the network is connected, since all events would presumably connect 
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back to the Big Bang. On the other hand, if (i) were false we might 
have no way of finding out, because the lack of causal connections 
would make it impossible for us to detect the existence of the other 
universes represented by the other parts disconnected from our own 
universe. 


Example: If we had a time machine,! we could violate (ii), but 
this brings up paradoxes, like the possibility of killing one’s own 
grandmother when she was a baby, and in any case nobody knows 
how to build a time machine. 


Example: There are nevertheless strong reasons for believing 
that (ii) is false. For example, if we drop Alice into one black hole, 
and Betty into another, they will never be able to communicate 
again, and therefore there is no way to have any cause and effect 
relationship between Alice’s events and Betty’s.? 


Since (iii) implies (ii), we suspect that (iii) is false as well. But 
Alice and Betty build clocks, and these clocks are remarkably suc- 
cessful at describing cause-and-effect relationships within the con- 
fines of the quarters in which they’ve lived their lives: events with 
higher clock readings never cause events with lower clock readings. 
They announce to their robot tutor that they’ve discovered a uni- 
versal thing called time, which explains all causal relationships, and 
which their experiments show flows at the same rate everywhere 
within their quarters. 


“Ah,” the tutor sighs, his metallic voice trailing off. 


“T know that ‘ah’, Tutorbot,” Betty says. “Come on, can’t you 
just tell us what we did wrong?” 


“You know that my pedagogical programming doesn’t allow that.” 


“Oh, sometimes I just want to strangle whoever came up with 
those stupid educational theories,” Alice says. 


The twins go on strike, protesting that the time theory works 
perfectly in every experiment they’ve been able to imagine. Tutor- 
bot gets on the commlink with his masters and has a long, inaudible 
argument, which, judging from the hand gestures, the twins imagine 
to be quite heated. He announces that he’s gotten approval for a 
field trip for one of the twins, on the condition that she remain in a 
sealed environment the whole time so as to maintain the conditions 
of the educational experiment. 


'The possibility of having time come back again to the same point is often 
referred to by physicists as a closed timelike curve (CTC). Kip Thorne, in his 
popularization Black Holes and Time Warps, recalls experiencing some anxiety 
after publishing a paper with “Time Machines” in the title, and later being 
embarrassed when a later paper on the topic was picked up by the National 
Enquirer with the headline PHYSICISTS PROVE TIME MACHINES EXIST. 
“CTC” is safer because nobody but physicists know what it means. 

?This point is revisited in section 6.1. 


Section 1.1 
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“Who gets to go?” Alice asks. 

“Betty,” Tutorbot replies, “because of the mashed potatoes.” 
“But I refrained!” Alice says, stamping her foot. 

“Only one time out of the last six that I served them.” 


The next day, Betty, smiling smugly, climbs aboard the sealed 
spaceship carrying a duffel bag filled with a large collection of clocks 
for the trip. Each clock has a duplicate left behind with Alice. The 
clock design that they’re proudest of consists of a tube with two 
mirrors at the ends. A flash of light bounces back and forth between 
the ends, with each round trip counting as one “tick,” one unit of 
time. The twins are convinced that this one will run at a constant 
rate no matter what, since it has no moving parts that could be 
affected by the vibrations and accelerations of the journey. 


Betty’s field trip is dull. She doesn’t get to see any of the outside 
world. In fact, the only way she can tell she’s not still at home is that 
she sometimes feels strong sensations of acceleration. (She’s grown 
up in zero gravity, so the pressing sensation is novel to her.) She’s 
out of communication with Alice, and all she has to do during the 
long voyage is to tend to her clocks. As a crude check, she verifies 
that the light clock seems to be running at its normal rate, judged 
against her own pulse. The pendulum clock gets out of synch with 
the light clock during the accelerations, but that doesn’t surprise 
her, because it’s a mechanical clock with moving parts. All of the 
nonmechanical clocks seem to agree quite well. She gets hungry for 
breakfast, lunch, and dinner at the usual times. 


When Betty gets home, Alice asks, “Well?” 


“Great trip, too bad you couldn’t come. I met some cute boys, 
went out dancing, ...” 


“You did not. What about the clocks?” 


“They all checked out fine. See, Tutorbot? The time theory still 
holds up.” 


“That was an anticlimax,” Alice says. “I’m going back to bed 


now.” 


“Bed?” Betty exclaims. “It’s three in the afternoon.” 


The twins now discover that although all of Alice’s clocks agree 
among themselves, and similarly for all of Betty’s (except for the 
ones that were obviously disrupted by mechanical stresses), Alice’s 
and Betty’s clocks disagree with one another. A week has passed 
for Alice, but only a couple of days for Betty. 


1.2 Experimental tests of the nature of time 
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1.2.1. The Hafele-Keating experiment 


In 1971, J.C. Hafele and R.E. Keating? of the U.S. Naval Obser- 
vatory brought atomic clocks aboard commercial airliners and went 
around the world, once from east to west and once from west to east. 
(The clocks had their own tickets, and occupied their own seats.) As 
in the parable of Alice and Betty, Hafele and Keating observed that 
there was a discrepancy between the times measured by the trav- 
eling clocks and the times measured by similar clocks that stayed 
at the lab in Washington. The result was that the east-going clock 
lost an amount of time Atg = —59 +10 ns, while the west-going 
one gained Atw = +2734 7 ns. This establishes that time is not 
universal and absolute. 


Nevertheless, causality was preserved. The nanosecond-scale ef- 
fects observed were small compared to the three-day lengths of the 
plane trips. There was no opportunity for paradoxical situations 
such as, for example, a scenario in which the east-going experimenter 
arrived back in Washington before he left and then proceeded to 
convince himself not to take the trip. 


Hafele and Keating were testing specific quantitative predictions 
of relativity, and they verified them to within their experiment’s 
error bars. At this point in the book, we aren’t in possession of 


a/The clock took up two seats, 


tats ; : and two tickets were bought for it 
enough relativity to be able to make such calculations, but, like under the name of “Mr. Clock.” 


Alice and Betty, we can inspect the empirical results for clues as to 
how time works. 


The opposite signs of the two results suggests that the rate at 
which time flows depends on the motion of the observer. The east- 
going clock was moving in the same direction as the earth’s rotation, 
so its velocity relative to the earth’s center was greater than that of 
the ones that remained in Washington, while the west-going clock’s 
velocity was correspondingly reduced.* The signs of the At’s show 
that moving clocks were slower. 


On the other hand, the asymmetry of the results, with |Atz| # 
|Aty|, implies that there was a second effect involved, simply due 
to the planes’ being up in the air. Relativity predicts that time’s 
rate of flow also changes with height in a gravitational field. The 
deeper reasons for such an effect are given in section 1.5.6 on page 
34. 


Although Hafele and Keating’s measurements were on the ragged 
edge of the state of the art in 1971, technology has now progressed 
to the point where such effects have everyday consequences. The 


3Hafele and Keating, Science, 177 (1972), 168 

“These differences in velocity are not simply something that can be eliminated 
by choosing a different frame of reference, because the clocks’ motion isn’t in 
a straight line. The clocks back in Washington, for example, have a certain 
acceleration toward the earth’s axis, which is different from the accelerations 
experienced by the traveling clocks. 
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satellites of the Global Positioning System (GPS) orbit at a speed 
of 1.9 x 103 m/s, an order of magnitude faster than a commercial 
jet. Their altitude of 20,000 km is also much greater than that of 
an aircraft. For both these reasons, the relativistic effect on time is 
stronger than in the Hafele-Keating experiment. The atomic clocks 
aboard the satellites are tuned to a frequency of 10.22999999543 
MHz, which is perceived on the ground as 10.23 MHz. (This fre- 
quency shift will be calculated in example 11 on page 58.) 


1.2.2 Muons 


Although the Hafele-Keating experiment is impressively direct, 
it was not the first verification of relativistic effects on time, it did 
not completely separate the kinematic and gravitational effects, and 
the effect was small. An early experiment demonstrating a large and 
purely kinematic effect was performed in 1941 by Rossi and Hall, 
who detected cosmic-ray muons at the summit and base of Mount 
Washington in New Hampshire. The muon has a mean lifetime of 
2.2 us, and the time of flight between the top and bottom of the 
mountain (about 2 km for muons arriving along a vertical path) 
at nearly the speed of light was about 7 us, so in the absence of 
relativistic effects, the flux at the bottom of the mountain should 
have been smaller than the flux at the top by about an order of 
magnitude. The observed ratio was much smaller, indicating that 
the “clock” constituted by nuclear decay processes was dramatically 
slowed down by the motion of the muons. 


1.2.3 Gravitational red-shifts 


The first experiment that isolated the gravitational effect on time 
was a 1925 measurement by W.S. Adams of the spectrum of light 
emitted from the surface of the white dwarf star Sirius B. The grav- 
itational field at the surface of Sirius B is 4 x 10°g, and the gravi- 
tational potential is about 3,000 times greater than at the Earth’s 
surface. The emission lines of hydrogen were red-shifted, i.e., re- 
duced in frequency, and this effect was interpreted as a slowing of 
time at the surface of Sirius relative to the surface of the Earth. His- 
torically, the mass and radius of Sirius were not known with better 
than order of magnitude precision in 1925, so this observation did 
not constitute a good quantitative test. 


The first such experiment to be carried out under controlled 
conditions, by Pound and Rebka in 1959, is analyzed quantitatively 
in example 7 on page 129. 


The first high-precision experiment of this kind was Gravity 
Probe A, a 1976 experiment” in which a space probe was launched 
vertically from Wallops Island, Virginia, at less than escape veloc- 
ity, to an altitude of 10,000 km, after which it fell back to earth and 
crashed down in the Atlantic Ocean. The probe carried a hydro- 


°Vessot at al., Physical Review Letters 45 (1980) 2081 
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gen maser clock which was used to control the frequency of a radio 
signal. The radio signal was received on the ground, the nonrela- 
tivistic Doppler shift was subtracted out, and the residual blueshift 
was interpreted as the gravitational effect effect on time, matching 
the relativistic prediction to an accuracy of 0.01%. 


1.3 Non-simultaneity and the maximum speed 
of cause and effect 


We've seen that time flows at different rates for different observers. 
Suppose that Alice and Betty repeat their Hafele-Keating-style ex- 
periment, but this time they are allowed to communicate during 
the trip. Once Betty’s ship completes its initial acceleration away 
from Alice, she cruises at constant speed, and each girl has her own 
equally valid inertial frame of reference. Each twin considers herself 
to be at rest, and says that the other is the one who is moving. 
Each one says that the other’s clock is the one that is slow. If they 
could pull out their phones and communicate instantaneously, with 
no time lag for the propagation of the signals, they could resolve 
the controversy. Alice could ask Betty, “What time does your clock 
read right now?” and get an immediate answer back. 


By the symmetry of their frames of reference, however, it seems 
that Alice and Betty should not be able to resolve the controversy 
during Betty’s trip. If they could, then they could release two radar 
beacons that would permanently establish two inertial frames of 
reference, A and B, such that time flowed, say, more slowly in B 
than in A. This would violate the principle that motion is relative, 


b / Gravity Probe A. 
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and that all inertial frames of reference are equally valid. The best 
that they can do is to compare clocks once Betty returns, and verify 
that the net result of the trip was to make Betty’s clock run more 
slowly on the average. 


Alice and Betty can never satisfy their curiosity about exactly 
when during Betty’s voyage the discrepancies accumulated or at 
what rate. This is information that they can never obtain, but 
they could obtain it if they had a system for communicating in- 
stantaneously. We conclude that instantaneous communication is 
impossible. There must be some maximum speed at which signals 
can propagate — or, more generally, a maximum speed at which 
cause and effect can propagate — and this speed must for example 
be greater than or equal to the speed at which radio waves propa- 
gate. It is also evident from these considerations that simultaneity 
itself cannot be a meaningful concept in relativity. 


1.4 Ordered geometry 


Let’s try to put what we’ve learned into a general geometrical con- 
text. 


Euclid’s familiar geometry of two-dimensional space has the fol- 
lowing axioms,®° which are expressed in terms of operations that can 
be carried out with a compass and unmarked straightedge: 


E1 Two points determine a line. 
E2 Line segments can be extended. 


E3 A unique circle can be constructed given any point as its center 
and any line segment as its radius. 


E4 All right angles are equal to one another. 


E5 Parallel postulate: Given a line and a point not on the line, 
no more than one line can be drawn through the point and 
parallel to the given line.’ 


The modern style in mathematics is to consider this type of 
axiomatic system as a self-contained sandbox, with the axioms, and 
any theorems proved from them, being true or false only in relation 
to one another. Euclid and his contemporaries, however, believed 
them to be empirical facts about physical reality. For example, they 
considered the fifth postulate to be less obvious than the first four, 
because in order to verify physically that two lines were parallel, 
one would theoretically have to extend them to an infinite distance 


®These axioms are summarized for quick reference in the back of the book on 
page 412. 

"This is a form known as Playfair’s axiom, rather than the version of the 
postulate originally given by Euclid. 
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and make sure that they never crossed. In the first 28 theorems of 
the Elements, Euclid restricts himself entirely to propositions that 
can be proved based on the more secure first four postulates. The 
more general geometry defined by omitting the parallel postulate is 
known as absolute geometry. 


What kind of geometry is likely to be applicable to general rel- 
ativity? We can see immediately that Euclidean geometry, or even 
absolute geometry, would be far too specialized. We have in mind 
the description of events that are points in both space and time. 
Confining ourselves for ease of visualization to one dimension worth 
of space, we can certainly construct a plane described by coordi- 
nates (t,2), but imposing Euclid’s postulates on this plane results 
in physical nonsense. Space and time are physically distinguishable 
from one another. But postulates 3 and 4 describe a geometry in 
which distances measured along non-parallel axes are comparable, 
and figures may be freely rotated without affecting the truth or 
falsehood of statements about them; this is only appropriate for a 
physical description of different spacelike directions, as in an (z, y) 
plane whose two axes are indistinguishable. 


We need to throw most of the specialized apparatus of Euclidean 
geometry overboard. Once we’ve stripped our geometry to a bare 
minimum, then we can go back and build up a different set of equip- 
ment that will be better suited to relativity. 


The stripped-down geometry we want is called ordered geometry, 
and was developed by Moritz Pasch around 1882. As suggested by 
the parable of Alice and Betty, ordered geometry does not have 
any global, all-encompassing system of measurement. When Betty 
goes on her trip, she traces out a particular path through the space 
of events, and Alice, staying at home, traces another. Although 
events play out in cause-and-effect order along each of these paths, 
we do not expect to be able to measure times along paths A and B 
and have them come out the same. This is how ordered geometry 
works: points can be put in a definite order along any particular 
line, but not along different lines. Of the four primitive concepts 
used in Euclid’s E1-E5 — point, line, circle, and angle — only the 
non-metrical notions of point (i-e., event) and line are relevant in 
ordered geometry. In a geometry without measurement, there is no 
concept of measuring distance (hence no compasses or circles), or of 
measuring angles. The notation [ABC] indicates that event B lies 
on a line segment joining A and C, and is strictly between them. 


The axioms of ordered geometry are as follows:® 


’The axioms are summarized for convenient reference in the back of the book 
on page 412. This is meant to be an informal, readable summary of the system, 
pitched to the same level of looseness as Euclid’s E1-E5. Modern mathematicians 
have found that systems like these actually need quite a bit more technical 
machinery to be perfectly rigorous, so if you look up an axiomatization of ordered 
geometry, or a modern axiomatization of Euclidean geometry, you'll typically 
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O1 Two events determine a line. 


O2 Line segments can be extended: given A and B, there is at 


A least one event such that [ABC] is true. 

2 O3 Lines don’t wrap around: if [ABC] is true, then [BCA] is false. 
GC 

O4 Betweenness: For any three distinct events A, B, and C lying 

on the same line, we can determine whether or not B is between 

A and C (and by statement 3, this ordering is unique except 

a/Axioms O2 (left) and O3 for a possible over-all reversal to form [CBA]). 

(right). 


O1-O2 express the same ideas as Euclid’s E1-E2. Not all lines 
in the system will correspond physically to chains of causality; we 
could have a line segment that describes a snapshot of a steel chain, 
and O3-O4 then say that the order of the links is well defined. But 
O83 and O4 also have clear physical significance for lines describing 
causality. O3 forbids time travel paradoxes, like going back in time 
and killing our own grandmother as a child; figure a illustrates why a 
violation of O3 is referred to as a closed timelike curve. O4 says that 
events are guaranteed to have a well-defined cause-and-effect order 
only if they lie on the same line. This is completely different from 
the attitude expressed in Newton’s famous statement: “Absolute, 
true and mathematical time, of itself, and from its own nature flows 
equably without regard to anything external ...” 


If yow’re dismayed by the austerity of a system of geometry with- 
out any notion of measurement, you may be more appalled to learn 
that even a system as weak as ordered geometry makes some state- 
ments that are too strong to be completely correct as a foundation 
for relativity. For example, if an observer falls into a black hole, at 
some point he will reach a central point of infinite density, called a 
singularity. At this point, his chain of cause and effect terminates, 
violating O2. It is also an open question whether O3’s prohibition 
on time-loops actually holds in general relativity; this is Stephen 
Hawking’s playfully named chronology protection conjecture. We’ll 
also see that in general relativity O1 is almost always true, but there 
are exceptions. 
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b/Stephen Hawking (1942-). 


find a much more lengthy list of axioms than the ones presented here. The 
axioms I’m omitting take care of details like making sure that there are more 
than two points in the universe, and that curves can’t cut through one another 
without intersecting. The classic, beautifully written book on these topics is 
H.S.M. Coxeter’s Introduction to Geometry, which is “introductory” in the sense 
that it’s the kind of book a college math major might use in a first upper-division 
course in geometry. 
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1.5.1 Proportionality of inertial and gravitational mass 
What physical interpretation should we give to the “lines” de- 


scribed in ordered geometry? Galileo described an experiment (which 


he may or may not have actually performed) in which he simultane- 
ously dropped a cannonball and a musket ball from a tall tower. The 
two objects hit the ground simultaneously, disproving Aristotle’s as- 
sertion that objects fell at a speed proportional to their weights. On 
a graph of spacetime with x and t axes, the curves traced by the two 
objects, called their world-lines, are identical parabolas. (The paths 
of the balls through x — y — z space are straight, not curved.) One 
way of explaining this observation is that what we call “mass” is re- 
ally two separate things, which happen to be equal. Inertial mass, 
which appears in Newton’s a = F'/m, describes how difficult it is 
to accelerate an object. Gravitational mass describes the strength 
with which gravity acts. The cannonball has a hundred times more 
gravitational mass than the musket ball, so the force of gravity act- 
ing on it is a hundred times greater. But its inertial mass is also 
precisely a hundred times greater, so the two effects cancel out, and 
it falls with the same acceleration. This is a special property of the 
gravitational force. Electrical forces, for example, do not behave 
this way. The force that an object experiences in an electric field 
is proportional to its charge, which is unrelated to its inertial mass, 
so different charges placed in the same electric field will in general 
have different motions. 


1.5.2 Geometrical treatment of gravity 


Einstein realized that this special property of the gravitational 
force made it possible to describe gravity in purely geometrical 
terms. We define the world-lines of small? objects acted on by grav- 
ity to be the lines described by the axioms of the geometry. Since 
we normally think of the “lines” described by Euclidean geometry 
and its kin as straight lines, this amounts to a redefinition of what 
it means for a line to be straight. By analogy, imagine stretching a 
piece of string taut across a globe, as we might do in order to plan 
an airplane flight or aim a directional radio antenna. The string 
may not appear straight as viewed from the three-dimensional Eu- 
clidean space in which the globe is embedded, but it is as straight as 
possible in the sense that it is the path followed by a radio wave,!? 
or by an airplane pilot who keeps her wings level and her rudder 
straight. The world-“line” of an object acted on by nongravita- 
tional forces is not considered to be a straight “line” in the sense of 


°The reason for the restriction to small objects is essentially gravitational 
radiation. The object should also be electrically neutral, and neither the ob- 
ject nor the surrounding spacetime should contain any exotic forms of negative 
energy. This is discussed in more detail on p. 312. See also problem 2 on p. 384. 

Radio waves in the HF band tend to be trapped between the ground and 
the ionosphere, causing them to curve over the horizon, allowing long-distance 
communication. 


a/ The cannonball and the mus- 
ketball have identical parabolic 
world-lines. On this type of 
space-time plot, space is conven- 
tionally shown on the horizontal 
axis, so the tower has to be 
depicted on its side. 


b/A piece of string held taut 
on a globe forms a geodesic 
from Mexico City to London. 
Although it appears curved, it 
is the analog of a straight line 
in the non-Euclidean geometry 
confined to the surface of the 
Earth. Similarly, the world-lines of 
figure a appear curved, but they 
are the analogs of straight lines 
in the non-Euclidean geometry 
used to describe gravitational 
fields in general relativity. 
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c / Lorand Eédtvés (1848-1919). 


d/lf the 
by an airplane and a radio wave 
differ from one another, then 
it is not possible to treat both 
problems exactly using the same 


geodesics defined 


geometrical theory. In general 
relativity, this would be analogous 
to a violation of the equivalence 
principle. General _relativity’s 
validity as a purely geometrical 
theory of gravity requires that the 
equivalence principle be exactly 
satisfied in all cases. 
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O1-O4. When necessary, one eliminates this ambiguity in the over- 
loaded term “line” by referring to the lines of O1-O4 as geodesics. 
The world-line of a low-mass object acted on only by gravity is one 
type of geodesic.!! 


We can now see the deep physical importance of statement O1, 
that two events determine a line. To predict the trajectory of a 
golf ball, we need to have some initial data. For example, we could 
measure event A when the ball breaks contact with the club, and 
event B an infinitesimal time after A.!? This pair of observations can 
be thought of as fixing the ball’s initial position and velocity, which 
should be enough to predict a unique world-line for the ball, since 
relativity is a deterministic theory. With this interpretation, we can 
also see why it is not necessarily a disaster for the theory if O1 
fails sometimes. For example, event A could mark the launching of 
two satellites into circular orbits from the same place on the Earth, 
heading in opposite directions, and B could be their subsequent 
collision on the opposite side of the planet. Although this violates 
O1, it doesn’t violate determinism. Determinism only requires the 
validity of O1 for events infinitesimally close together. Even for 
randomly chosen events far apart, the probability that they will 
violate O1 is zero. 


1.5.3 Eotvés experiments 


Einstein’s entire system breaks down if there is any violation, no 
matter how small, of the proportionality between inertial and grav- 
itational mass, and it therefore becomes very interesting to search 
experimentally for such a violation. For example, we might won- 
der whether neutrons and protons had slightly different ratios of 
gravitational and inertial mass, which in a Galileo-style experiment 
would cause a small difference between the acceleration of a lead 
weight, with a large neutron-to-proton ratio, and a wooden one, 
which consists of light elements with nearly equal numbers of neu- 
trons and protons. The first high-precision experiments of this type 
were performed by Edtvés around the turn of the twentieth century, 
and they verified the equivalence of inertial and gravitational mass 
to within about one part in 10°. These are generically referred to 
as Eotvos experiments. 


Figure e shows a strategy for doing Edtvés experiments that al- 
lowed a test to about one part in 10!7. The top panel is a simplified 
version. The platform is balanced, so the gravitational masses of 
the two objects are observed to be equal. The objects are made 
of different substances. If the equivalence of inertial and gravita- 
tional mass fails to hold for these two substances, then the force 
of gravity on each mass will not be exact proportion to its inertia, 
and the platform will experience a slight torque as the earth spins. 


For more justification of this statement, see ch. 9, problem 2, on page 384. 
Regarding infinitesimals, see p. 94. 
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The bottom panel shows a more realistic drawing of an experiment 
by Braginskii and Panov.'? The whole thing was encased in a tall 
vacuum tube, which was placed in a sealed basement whose tem- 
perature was controlled to within 0.02°C. The total mass of the 
platinum and aluminum test masses, plus the tungsten wire and 
the balance arms, was only 4.4 g. To detect tiny motions, a laser 
beam was bounced off of a mirror attached to the wire. There was 
so little friction that the balance would have taken on the order of 
several years to calm down completely after being put in place; to 
stop these vibrations, static electrical forces were applied through 
the two circular plates to provide very gentle twists on the ellipsoidal 
mass between them. 


In the 45 years since Braginskii and Panov’s work, improvements 
have been made in more direct experimental tests of the equivalence 
principle, in which the test masses simply free-fall. The best earth- 
bound experiment of this type!“ has given a bound of 107°, while a 
new experiment in orbit!> has tightened this to 1074. 


Equivalence of gravitational fields and accelerations 


One consequence of the E6tv6s experiments’ null results is that 
it is not possible to tell the difference between an acceleration and 
a gravitational field. At certain times during Betty’s field trip, she 
feels herself pressed against her seat, and she interprets this as ev- 
idence that she’s in a space vessel that is undergoing violent accel- 
erations and decelerations. But it’s equally possible that Tutorbot 
has simply arranged for her capsule to be hung from a rope and 
dangled into the gravitational field of a planet. Suppose that the 
first explanation is correct. The capsule is initially at rest in outer 
space, where there is no gravity. Betty can release a pencil and a 
lead ball in the air inside the cabin, and they will stay in place. The 
capsule then accelerates, and to Betty, who has adopted a frame 
of reference tied to its deck, ceiling and walls, it appears that the 
pencil and the ball fall to the deck. They are guaranteed to stay 
side by side until they hit the deckplates, because in fact they aren’t 
accelerating; they simply appear to accelerate, when in reality it’s 
the deckplates that are coming up and hitting them. But now con- 
sider the second explanation, that the capsule has been dipped into 
a gravitational field. The ball and the pencil will still fall side by 
side to the floor, because they have the same ratio of gravitational 
to inertial mass. 


'8V_B. Braginskii and V.I. Panov, Soviet Physics JETP 34, 463 (1972). 

“Carusotto et al., “Limits on the violation of g-universality with a Galileo- 
type experiment,” Phys Lett A183 (1993) 355. Freely available online at re- 
searchgate.net. 

™Touboul et al., “The MICROSCOPE mission: first results of a space test of 
the Equivalence Principle,” arxiv.org/abs/1712.01176 


e/An Edtvés 


experiment. 


Top: simplified version. Bottom: 
realistic version by Braginskii and 
Panov. (Drawing after Braginskii 


and Panov.) 
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1.5.4 The equivalence principle 


This leads to one way of stating a central principle of relativity 
known as the equivalence principle: Accelerations and gravitational 
fields are equivalent. There is no experiment that can distinguish 
one from the other.!® 


To see what a radical departure this is, we need to compare with 
the completely different picture presented by Newtonian physics and 
special relativity. Newton’s law of inertia states that “Every object 
perseveres in its state of rest, or of uniform motion in a straight 
line, unless it is compelled to change that state by forces impressed 
thereon.”!” Newton’s intention here was to clearly state a contra- 
diction of Aristotelian physics, in which objects were supposed to 
naturally stop moving and come to rest in the absence of a force. For 
Aristotle, “at rest” meant at rest relative to the Earth, which repre- 
sented a special frame of reference. But if motion doesn’t naturally 
stop of its own accord, then there is no longer any way to single out 
one frame of reference, such as the one tied to the Earth, as being 
special. An equally good frame of reference is a car driving in a 
straight line down the interstate at constant speed. The earth and 
the car both represent valid inertial frames of reference, in which 
Newton’s law of inertia is valid. On the other hand, there are other, 
noninertial frames of reference, in which the law of inertia is vio- 
lated. For example, if the car decelerates suddenly, then it appears 
to the people in the car as if their bodies are being jerked forward, 
even though there is no physical object that could be exerting any 
type of forward force on them. This distinction between inertial and 
noninertial frames of reference was carried over by Einstein into his 
theory of special relativity, published in 1905. 


But by the time he published the general theory in 1915, Einstein 
had realized that this distinction between inertial and noninertial 
frames of reference was fundamentally suspect. How do we know 
that a particular frame of reference is inertial? One way is to verify 
that its motion relative to some other inertial frame, such as the 
Earth’s, is in a straight line and at constant speed. But how does 
the whole thing get started? We need to bootstrap the process 
with at least one frame of reference to act as our standard. We 
can look for a frame in which the law of inertia is valid, but now 
we run into another difficulty. To verify that the law of inertia 
holds, we have to check that an observer tied to that frame doesn’t 
see objects accelerating for no reason. The trouble here is that by 
the equivalence principle, there is no way to determine whether the 
object is accelerating “for no reason” or because of a gravitational 
force. Betty, for example, cannot tell by any local measurement 
(ie., any measurement carried out within the capsule) whether she 


'6This statement of the equivalence principle is summarized, along with some 
other forms of it to be encountered later, in the back of the book on page 413. 
'7paraphrased from a translation by Motte, 1729 
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is in an inertial or a noninertial frame. 


f/ Wouldn't it be nice if we could define the meaning of a Newtonian inertial frame of reference? New- 
ton makes it sound easy: to define an inertial frame, just find some object that is not accelerating because it is 
not being acted on by any external forces. But what object would we use? The earth? The “fixed stars?” Our 
galaxy? Our supercluster of galaxies? All of these are accelerating — relative to something. 


We could hope to resolve the ambiguity by making non-local 
measurements instead. For example, if Betty had been allowed to 
look out a porthole, she could have tried to tell whether her capsule 
was accelerating relative to the stars. Even this possibility ends up 
not being satisfactory. The stars in our galaxy are moving in circular 
orbits around the galaxy. On an even larger scale, the universe is 
expanding in the aftermath of the Big Bang. It spent about the 
first half of its history decelerating due to gravitational attraction, 
but the expansion is now observed to be accelerating, apparently 
due to a poorly understood phenomenon referred to by the catch-all 
term “dark energy.” In general, there is no distant background of 
physical objects in the universe that is not accelerating. 


Lorentz frames 


The conclusion is that we need to abandon the entire distinction 
between Newton-style inertial and noninertial frames of reference. 
The best that we can do is to single out certain frames of reference 
defined by the motion of objects that are not subject to any non- 
gravitational forces. A falling rock defines such a frame of reference. 
In this frame, the rock is at rest, and the ground is accelerating. The 
rock’s world-line is a straight line of constant « = 0 and varying t. 
Such a free-falling frame of reference is called a Lorentz frame. The 
frame of reference defined by a rock sitting on a table is an inertial 
frame of reference according to the Newtonian view, but it is not a 
Lorentz frame. 


In Newtonian physics, inertial frames are preferable because they 
make motion simple: objects with no forces acting on them move 
along straight world-lines. Similarly, Lorentz frames occupy a privi- 
leged position in general relativity because they make motion simple: 
objects move along “straight” world-lines if they have no nongravi- 
tational forces acting on them. 


As SN) 


\o——_- 10 


g /An artificial horizon. 
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h/Bars — of 
kept in special warehouses, 
bolted to the ground. Copyright 
Jay Ward Productions, used 
under U.S. fair use exception to 
copyright law. 


upsidasium are 
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The artificial horizon Example: 1 
The pilot of an airplane cannot always easily tell which way is up. 
The horizon may not be level simply because the ground has an 
actual slope, and in any case the horizon may not be visible if the 
weather is foggy. One might imagine that the problem could be 
solved simply by hanging a pendulum and observing which way 
it pointed, but by the equivalence principle the pendulum cannot 
tell the difference between a gravitational field and an acceler- 
ation of the aircraft relative to the ground — nor can any other 
accelerometer, such as the pilot’s inner ear. For example, when 
the plane is turning to the right, accelerometers will be tricked into 
believing that “down” is down and to the left. To get around this 
problem, airplanes use a device called an artificial horizon, which 
is essentially a gyroscope. The gyroscope has to be initialized 
when the plane is known to be oriented in a horizontal plane. No 
gyroscope is perfect, so over time it will drift. For this reason the 
instrument also contains an accelerometer, and the gyroscope is 
automatically restored to agreement with the accelerometer, with 
a time-constant of several minutes. If the plane is flown in cir- 
cles for several minutes, the artificial horizon will be fooled into 
indicating that the wrong direction is vertical. 


‘No antigravity Example: 2 
This whole chain of reasoning was predicated on the null results 
of Edtvés experiments. In the Rocky and Bullwinkle cartoons, 
there is anon-Edtvdsian substance called upsidasium, which falls 
up instead of down. Its ratio of gravitational to inertial mass is 
apparently negative. If such a substance could be found, it would 
falsify the equivalence principle. Cf. example 10, p. 315. 


Operational definition of a Lorentz frame 


We can define a Lorentz frame in operational terms using an ide- 
alized variation (figure i) on a device actually built by Harold Waage 
at Princeton as a lecture demonstration to be used by his partner 
in crime John Wheeler. Build a sealed chamber whose contents are 
isolated from all nongravitational forces. Of the four known forces 
of nature, the three we need to exclude are the strong nuclear force, 
the weak nuclear force, and the electromagnetic force. The strong 
nuclear force has a range of only about 1 fm (107° m), so to ex- 
clude it we merely need to make the chamber thicker than that, and 
also surround it with enough paraffin wax to keep out any neutrons 
that happen to be flying by. The weak nuclear force also has a short 
range, and although shielding against neutrinos is a practical impos- 
sibility, their influence on the apparatus inside will be negligible. To 
shield against electromagnetic forces, we surround the chamber with 
a Faraday cage and a solid sheet of mu-metal. Finally, we make sure 
that the chamber is not being touched by any surrounding matter, 
so that short-range residual electrical forces (sticky forces, chem- 
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ical bonds, etc.) are excluded. That is, the chamber cannot be 
supported; it is free-falling. 


Crucially, the shielding does not exclude gravitational forces. 
There is in fact no known way of shielding against gravitational ef- 
fects such as the attraction of other masses (example 10, p. 315) or 
the propagation of gravitational waves (ch. 9). Because the shield- 
ing is spherical, it exerts no gravitational force of its own on the 
apparatus inside. 


Inside, an observer carries out an initial calibration by firing 
bullets along three Cartesian axes and tracing their paths, which 
she defines to be linear. 


We've gone to elaborate lengths to show that we can really de- 
termine, without reference to any external reference frame, that 
the chamber is not being acted on by any nongravitational forces, 
so that we know it is free-falling. In addition, we also want the 
observer to be able to tell whether the chamber is rotating. She 
could look out through a porthole at the stars, but that would be 
missing the whole point, which is to show that without reference to 
any other object, we can determine whether a particular frame is a 
Lorentz frame. One way to do this would be to watch for precession 
of a gyroscope. Or, without having to resort to additional appara- 
tus, the observer can check whether the paths traced by the bullets 
change when she changes the muzzle velocity. If they do, then she 
infers that there are velocity-dependent Coriolis forces, so she must 
be rotating. She can then use flywheels to get rid of the rotation, 
and redo the calibration. 


i/ The spherical chamber, shown 
in a cutaway view, has layers 
of shielding to exclude all known 
nongravitational forces. Once 
the chamber has been calibrated 
by marking the three dashed-line 
trajectories under free-fall con- 
ditions, an observer inside the 
chamber can always tell whether 
she is in a Lorentz frame. 
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j/ Two 
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local 


Lorentz frames. 


After the initial calibration, she can always tell whether or not 
she is in a Lorentz frame. She simply has to fire the bullets, and see 
whether or not they follow the precalibrated paths. For example, 
she can detect that the frame has become non-Lorentzian if the 
chamber is rotated, allowed to rest on the ground, or accelerated by 
a rocket engine. 


It may seem that the detailed construction of this elaborate 
thought-experiment does nothing more than confirm something ob- 
vious. It is worth pointing out, then, that we don’t really know 
whether it works or not. It works in general relativity, but there are 
other theories of gravity, such as Brans-Dicke gravity (p. 357), that 
are also consistent with all known observations, but in which the 
apparatus in figure i doesn’t work. Two of the assumptions made 
above fail in this theory: gravitational shielding effects exist, and 
Coriolis effects become undetectable if there is not enough other 
matter nearby. 


Locality of Lorentz frames 


It would be convenient if we could define a single Lorentz frame 
that would cover the entire universe, but we can’t. In figure j, two 
girls simultaneously drop down from tree branches — one in Los An- 
geles and one in Mumbai. The girl free-falling in Los Angeles defines 
a Lorentz frame, and in that frame, other objects falling nearby will 
also have straight world-lines. But in the LA girl’s frame of refer- 
ence, the girl falling in Mumbai does not have a straight world-line: 
she is accelerating up toward the LA girl with an acceleration of 
about 2g. 


A second way of stating the equivalence principle is that it is 
always possible to define a local Lorentz frame in a particular neigh- 
borhood of spacetime.!® It is not possible to do so on a universal 
basis. 


The locality of Lorentz frames can be understood in the anal- 
ogy of the string stretched across the globe. We don’t notice the 
curvature of the Earth’s surface in everyday life because the radius 
of curvature is thousands of kilometers. On a map of LA, we don’t 
notice any curvature, nor do we detect it on a map of Mumbai, but 
it is not possible to make a flat map that includes both LA and 
Mumbai without seeing severe distortions. 


Terminology 


The meanings of words evolve over time, and since relativity is 
now a century old, there has been some confusing semantic drift 
in its nomenclature. This applies both to “inertial frame” and to 
“special relativity.” 


8This statement of the equivalence principle is summarized, along with some 
other forms of it, in the back of the book on page 413. 
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Early formulations of general relativity never refer to “inertial 
frames,” “Lorentz frames,” or anything else of that flavor. The very 
first topic in Einstein’s original systematic presentation of the the- 
ory!® is an example (figure k) involving two planets, the purpose 
of which is to convince the reader that all frames of reference are 
created equal, and that any attempt to make some of them into 
second-class citizens is invidious. Other treatments of general rel- 
ativity from the same era follow Einstein’s lead.?° The trouble is 
that this example is more a statement of Einstein’s aspirations for 
his theory than an accurate depiction of the physics that it actu- 
ally implies. General relativity really does allow an unambiguous 
distinction to be made between Lorentz frames and non-Lorentz 
frames, as described on p. 26. Einstein’s statement should have 
been weaker: the laws of physics (such as the Einstein field equa- 
tion, p. 295) are the same in all frames (Lorentz or non-Lorentz). 
This is different from the situation in Newtonian mechanics and spe- 
cial relativity, where the laws of physics take on their simplest form 
only in Newton-inertial frames. 


Because Einstein didn’t want to make distinctions between frames, 
we ended up being saddled with inconvenient terminology for them. 
The least verbally awkward choice is to hijack the term “inertial,” 
redefining it from its Newtonian meaning. We then say that the 
Earth’s surface is not an inertial frame, in the context of general 
relativity, whereas in the Newtonian context it is an inertial frame 
to a very good approximation. This usage is fairly standard,?! but 
would have made Newton confused and Einstein unhappy. If we 
follow this usage, then we may sometimes have to say “Newtonian- 
inertial” or “Einstein-inertial.”. A more awkward, but also more 
precise, term is “Lorentz frame,” as used in this book; this seems to 
be widely understood.?” 


The distinction between special and general relativity has under- 
gone a similar shift over the decades. Einstein originally defined the 
distinction in terms of the admissibility of accelerated frames of ref- 
erence. This, however, puts us in the absurd position of saying that 
special relativity, which is supposed to be a generalization of Newto- 
nian mechanics, cannot handle accelerated frames of reference in the 
same way that Newtonian mechanics can. In fact both Newtonian 
mechanics and special relativity treat Newtonian-noninertial frames 
of reference in the same way: by modifying the laws of physics so 
that they do not take on their most simple form (e.g., violating New- 


'9Finstein, “The Foundation of the General Theory of Relativity,” 1916. An 
excerpt is given on p. ??. 

20Two that I believe were relatively influential are Born’s 1920 Einstein’s The- 
ory of Relativity and Eddington’s 1924 The Mathematical Theory of Relativity. 
Born follows Einstein’s “Foundation” paper slavishly. Eddington seems only to 
mention inertial frames in a few places where the context is Newtonian. 

21Misner, Thorne, and Wheeler, Gravitation, 1973, p. 18 

2 ibid, p. 19 


k/One planet rotates about 
its axis and the other does not. 
As discussed in more detail on 
p. 116, Einstein believed that 
general relativity was even more 
radically egalitarian about frames 
of reference than it really is. He 
thought that if the planets were 
alone in an otherwise empty 
universe, there would be no way 
to tell which planet was really 
rotating and which was not, so 
that B’s equatorial bulge would 
have to disappear. There would 
be no way to tell which planet’s 
surface was a Lorentz frame. 
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ton’s third law), while retaining the ability to change coordinates 
back to a preferred frame in which the simpler laws apply. It was 
realized fairly early on?* that the important distinction was between 
special relativity as a theory of flat spacetime, and general relativity 
as a theory that described gravity in terms of curved spacetime. All 
relativists writing since about 1950 seem to be in agreement on this 
more modern redefinition of the terms.7+ 


In an accelerating frame, the equivalence principle tells us that 
measurements will come out the same as if there were a gravitational 
field. But if the spacetime is flat, describing it in an accelerating 
frame doesn’t make it curved. (Curvature is a physical property of 
spacetime, and cannot be changed from zero to nonzero simply by 
a choice of coordinates.) Thus relativity allows us to have gravita- 
tional fields in flat space — but only for certain special configura- 
tions like this one. Special relativity is capable of operating just fine 
in this context. For example, Chung et al.?° did a high-precision 
test of special relativity in 2009 using a matter interferometer in a 
vertical plane, specifically in order to test whether there was any 
violation of special relativity in a uniform gravitational field. Their 
experiment is interpreted purely as a test of special relativity, not 
general relativity. 


Chiao’s paradox 


The remainder of this subsection deals with the subtle ques- 
tion of whether and how the equivalence principle can be applied to 
charged particles. You may wish to skip it on a first reading. The 
short answer is that using the equivalence principle to make con- 
clusions about charged particles is like the attempts by slaveholders 
and abolitionists in the 19th century U.S. to support their positions 
based on the Bible: you can probably prove whichever conclusion 
was the one you set out to prove. 


The equivalence principle is not a single, simple, mathemati- 
cally well defined statement.?© As an example of an ambiguity that 
is still somewhat controversial, 90 years after Einstein first proposed 
the principle, consider the question of whether or not it applies to 
charged particles. Raymond Chiao?’ proposes the following thought 
experiment, which I’ll refer to as Chiao’s paradox. Let a neutral par- 
ticle and a charged particle be set, side by side, in orbit around the 


?3Eddington, op. cit. 

?4Mfisner, Thorne, and Wheeler, op. cit., pp.163-164. Penrose, The Road to 
Reality, 2004, p. 422. Taylor and Wheeler, Spacetime Physics, 1992, p. 132. 
Schutz, A First Course in General Relativity, 2009, pp. 3, 141. Hobson, General 
Relativity: An Introduction for Physicists, 2005, sec. 1.14. 

 arxiv.org/abs/0905.1929 

26 A good recent discussion of this is “Theory of gravitation theories: a no- 
progress report,” Sotiriou, Faraoni, and Liberati, http: //arxiv.org/abs/0707. 
2748 

*Tarxiv.org/abs/quant-ph/0601193v7 
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earth. Assume (unrealistically) that the space around the earth has 
no electric or magnetic field. If the equivalence principle applies 
regardless of charge, then these two particles must go on orbiting 
amicably, side by side. But then we have a violation of conservation 
of energy, since the charged particle, which is accelerating, will radi- 
ate electromagnetic waves (with very low frequency and amplitude). 
It seems as though the particle’s orbit must decay. 


The resolution of the paradox, as demonstrated by hairy cal- 
culations?® is interesting because it exemplifies the local nature of 
the equivalence principle. When a charged particle moves through a 
gravitational field, in general it is possible for the particle to experi- 
ence a reaction from its own electromagnetic fields. This might seem 
impossible, since an observer in a frame momentarily at rest with 
respect to the particle sees the radiation fly off in all directions at 
the speed of light. But there are in fact several different mechanisms 
by which a charged particle can be reunited with its long-lost elec- 
tromagnetic offspring. An example (not directly related to Chiao’s 
scenario) is the following. 


Bring a laser very close to a black hole, but not so close that it 
has strayed inside the event horizon, which is the spherical point of 
no return from within which nothing can escape. Example 14 on 
page 64 gives a plausibility argument based on Newtonian physics 
that the radius”? of the event horizon should be something like rj = 
GM/c’, and section 6.3.2 on page 237 derives the relativistically 
correct factor of 2 in front, so that ry = 2GM/c?. It turns out that 
at r = (3/2)ry, a ray of light can have a circular orbit around the 
black hole. Since this is greater than rz, we can, at least in theory, 
hold the laser stationary at this value of r using a powerful rocket 
engine. If we point the laser in the azimuthal direction, its own 
beam will come back and hit it. 


Since matter can experience a back-reaction from its own elec- 
tromagnetic radiation, it becomes plausible how the paradox can be 
resolved. The equivalence principle holds locally, i.e., within a small 
patch of space and time. If Chiao’s charged and neutral particle are 
released side by side, then they will obey the equivalence principle 
for at least a certain amount of time — and “for at least a certain 
amount of time” is all we should expect, since the principle is local. 
But after a while, the charged particle will start to experience a 


*8The first detailed calculation appears to have been by Cécile and Bryce 
DeWitt, “Falling Charges,” Physics 1 (1964) 3. This paper is unfortunately 
very difficult to obtain now. A more recent treatment by Grgn and Ness is 
accessible at arxiv.org/abs/0806.0464v1. A full exposition of the techniques 
is given by Poisson, “The Motion of Point Particles in Curved Spacetime,” www. 
livingreviews.org/lrr-2004-6. 

°Because relativity describes gravitational fields in terms of curvature of 
spacetime, the Euclidean relationship between the radius and circumference of 
a circle fails here. The r coordinate should be understood here not as the radius 


measured from the center but as the circumference divided by 27. 


|/Chiao’s paradox: a charged 
particle and a neutral particle are 
in orbit around the earth. Will the 
charged particle radiate, violating 
the equivalence principle? 
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m/1. A photon is emitted 


upward from the floor of the ele- 
vator. The elevator accelerates 
upward. 2. By the time the 
photon is detected at the ceiling, 
the elevator has changed its 
velocity, so the photon is detected 
with a Doppler shift. 


Tyas, | 
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n/An_ electromagnetic wave 
strikes an ohmic surface. The 
wave’s electric field excites an 
oscillating current density J. The 
wave’s magnetic field then acts 
on these currents, producing 
a force in the direction of the 
wave’s propagation. This is a 
pre-relativistic argument that 
light must possess inertia. The 
first experimental confirmation of 
this prediction is shown in figure 
0. See Nichols and Hull, "The 
pressure due to radiation,” Phys. 
Rev. (Series I) 17 (1903) 26. 
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back-reaction from its own electromagnetic fields, and this causes 
its orbit to decay, satisfying conservation of energy. Since Chiao’s 
particles are orbiting the earth, and the earth is not a black hole, the 
mechanism clearly can’t be as simple as the one described above, 
but Gron and Ness show that there are similar mechanisms that 
can apply here, e.g., scattering of light waves by the nonuniform 
gravitational field. 


It is worth keeping in mind the DeWitts’ caution that “The ques- 
tions answered by this investigation are of conceptual interest only, 
since the forces involved are far too small to be detected experimen- 
tally” (see problem 8, p. 39). 


1.5.5 Gravitational red-shifts 


Starting on page 15, we saw experimental evidence that the rate 
of flow of time changes with height in a gravitational field. We can 
now see that this is required by the equivalence principle. 


By the equivalence principle, there is no way to tell the difference 
between experimental results obtained in an accelerating laboratory 
and those found in a laboratory immersed in a gravitational field. 
In a laboratory accelerating upward, a photon emitted from the 
floor and would be Doppler-shifted toward lower frequencies when 
observed at the ceiling, because of the change in the receiver’s ve- 
locity during the photon’s time of flight. The effect is given by 
AE/E =Af/f =ay/c’, where a is the lab’s acceleration, y is the 
height from floor to ceiling, and c is the speed of light. 


Self-check: Verify this statement. 


By the equivalence principle, we find that when such an experi- 
ment is done in a gravitational field g, there should be a gravitational 
effect. on the energy of a photon equal to AE/E = gy/c?. Since the 
quantity gy is the gravitational potential (gravitational energy per 
unit mass), the photon’s fractional loss of energy is the same as the 
(Newtonian) loss of energy experienced by a material object of mass 


m and initial kinetic energy mc’. 


The interpretation is as follows. Classical electromagnetism re- 
quires that electromagnetic waves have inertia. For example, if a 
plane wave strikes an ohmic surface, as in figure n, the wave’s elec- 
tric field excites oscillating currents in the surface. These currents 
then experience a magnetic force from the wave’s magnetic field, 
and application of the right-hand rule shows that the resulting force 
is in the direction of propagation of the wave. Thus the light wave 
acts as if it has momentum. The equivalence principle says that 
whatever has inertia must also participate in gravitational interac- 
tions. Therefore light waves must have weight, and must lose energy 
when they rise through a gravitational field. 


3°Problem 4 on p. 38 verifies, in one specific example, that this way of stating 
the equivalence principle is implied by the one on p. 21. 
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Self-check: Verify the application of the right-hand rule de- 
scribed above. 


Further interpretation: 


Chiao’s paradox revisited 


The quantity mc? is famous, even among people who don’t 
know what m and c stand for. This is the first hint of where 
it comes from. The full story is given in section 4.2.2. 


The relation p = E/c between the energy and momentum 
of a light wave follows directly from Maxwell’s equations, by 
the argument above; however, we will see in section 4.2.2 that 
according to relativity this relation must hold for any massless 
particle 


What we have found agrees with Niels Bohr’s correspondence 
principle, which states that when a new physical theory, such 
as relativity, replaces an older one, such as Newtonian physics, 
the new theory must agree with the old one under the experi- 
mental conditions in which the old theory had been verified by 
experiments. The gravitational mass of a beam of light with 
energy E is E/c*, and since c is a big number, it is not sur- 
prising that the weight of light rays had never been detected 
before Einstein trying to detect it. 


This book describes one particular theory of gravity, Einstein’s 
theory of general relativity. There are other theories of grav- 
ity, and some of these, such as the Brans-Dicke theory, do 
just as well as general relativity in agreeing with the presently 
available experimental data. Our prediction of gravitational 
Doppler shifts of light only depended on the equivalence princi- 
ple, which is one ingredient of general relativity. Experimental 
tests of this prediction only test the equivalence principle; they 
do not allow us to distinguish between one theory of gravity 
and another if both theories incorporate the equivalence prin- 
ciple. 


If an object such as a radio transmitter or an atom in an ex- 
cited state emits an electromagnetic wave with a frequency f, 
then the object can be considered to be a type of clock. We 
can therefore interpret the gravitational red-shift as a gravi- 
tational time dilation: a difference in the rate at which time 
itself flows, depending on the gravitational potential. This is 
consistent with the empirical results presented in section 1.2.1, 
p. 15. 


Example: 3 


The equivalence principle says that electromagnetic waves have 
gravitational mass as well as inertial mass, so it seems clear that 
the same must hold for static fields. In Chiao’s paradox (p. 39), the 


SZ 


0/A_ simplified drawing — of 
the 1903 experiment by Nichols 
and Hull that verified the pre- 
dicted momentum of light waves. 
Two circular mirrors were hung 
from a fine quartz fiber, inside 
an evacuated bell jar. A 150 
mW beam of light was shone 
on one of the mirrors for 6 s, 
producing a tiny rotation, which 
was measurable by an optical 
lever (not shown). The force was 
within 0.6% of the theoretically 
predicted value of 0.001 UN. 
For comparison, a short clipping 
of a single human hair weighs 
~ 1 uN. 
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orbiting charged particle has an electric field that extends out to 
infinity. When we measure the mass of a charged particle such as 
an electron, there is no way to separate the mass of this field from 
a more localized contribution. The electric field “falls” through the 
gravitational field, and the equivalence principle, which is local, 
cannot guarantee that all parts of the field rotate uniformly about 
the earth, even in distant parts of the universe. The electric field 
pattern becomes distorted, and this distortion causes a radiation 
reaction which back-reacts on the particle, causing its orbit to de- 
Cay. 


1.5.6 The Pound-Rebka experiment 


The 1959 Pound-Rebka experiment at Harvard?! was one of the 
first high-precision, relativistic tests of the equivalence principle to 
be carried out under controlled conditions, and in this section we 
will discuss it in detail. 


When y is on the order of magnitude of the height of a building, 
the value of AE/E = gy/c? is ~ 107'4, so an extremely high- 
precision experiment is necessary in order to detect a gravitational 
red-shift. A number of other effects are big enough to obscure it en- 
tirely, and must somehow be eliminated or compensated for. These 
are listed below, along with their orders of magnitude in the exper- 
imental design finally settled on by Pound and Rebka. 


3'Phys. Rev. Lett. 4 (1960) 337 
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(1) Classical Doppler broadening due to temper- ~ 10~° 
ature. Thermal motion causes Doppler shifts of 
emitted photons, corresponding to the random 
component of the emitting atom’s velocity vec- 
tor along the direction of emission. 

(2) The recoil Doppler shift. When an atom 
emits a photon with energy FE and momentum 
p = E/c, conservation of momentum requires 
that the atom recoil with momentum p = —E/c 
and energy p*/2m. This causes a downward 
Doppler shift of the energy of the emitted pho- 
ton. A similar effect occurs on absorption, dou- 
bling the problem. 

(3) Natural line width. The Heisenberg uncer- 
tainty principle says that a state with a half-life 
7 must have an uncertainty in its energy of at 
least ~ h/t, where h is Planck’s constant. 

(4) Special-relativistic Doppler shift due to tem- 
perature. Section 1.2 presented experimental ev- 
idence that time flows at a different rate depend- 
ing on the motion of the observer. Therefore 
the thermal motion of an atom emitting a pho- 
ton has an effect on the frequency of the photon, 
even if the atom’s motion is not along the line of 
emission. The equations needed in order to cal- 
culate this effect will not be derived until section 
2.2; a quantitative estimate is given in example 
13 on page 61. For now, we only need to know 
that this leads to a temperature-dependence in 
the average frequency of emission, in addition 
to the broadening of the bell curve described by 
effect (1) above. 


ay 19-2 


~~ 19712 


~ 107 per 
degree C 


The most straightforward way to mitigate effect (1) is to use 
photons emitted from a solid. At first glance this would seem like 
a bad idea, since electrons in a solid emit a continuous spectrum of 
light, not a discrete spectrum like the ones emitted by gases; this 
is because we have N electrons, where N is on the order of Avo- 
gadro’s number, all interacting strongly with one another, so by the 
correspondence principle the discrete quantum-mechanical behavior 
must be averaged out. But the protons and neutrons within one 
nucleus do not interact much at all with those in other nuclei, so 
the photons emitted by a nucleus do have a discrete spectrum. The 
energy scale of nuclear excitations is in the keV or MeV range, so 
these photons are x-rays or gamma-rays. Furthermore, the time- 
scale of the random vibrations of a nucleus in a solid are extremely 
short. For a velocity on the order of 100 m/s, and vibrations with an 
amplitude of ~ 107!° m, the time is about 107!? s. In many cases, 
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p/The 
ment. 


experi- 


gore, 14 keV 


°7Fe 


q/Emission of 14 keV gamma- 
rays by °’Fe. The parent nucleus 
5’Co absorbs an electron and 
undergoes a weak-force decay 
process that converts it into °’Fe, 
in an excited state. With 85% 
probability, this state decays to a 
state just above the ground state, 
with an excitation energy of 14 
keV and a half-life of 10~’ s. This 
state finally decays, either by 
gamma emission or emission of 
an internal conversion electron, 
to the ground state. 
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count rate 


r/ Top: A graph of velocity 
versus time for the source. The 
velocity has both a _ constant 
component and an oscillating one 
with a frequency of 10-50 Hz. 
The constant component vo was 
used as a way of determining the 
calibration of frequency shift as a 
function of count rates. Data were 
acquired during the quarter-cycle 
periods of maximum oscillatory 
velocity, 1 and 2. Bottom: Count 
rates as a function of velocity, 
for Vy = O and wy + 0. The 
dashed curve and black circles 
represent the count rates that 
would have been observed if 
there were no gravitational effect. 
The gravitational effect shifts 
the resonance curve to one side 
(solid curve), resulting in an 
asymmetry of the count rates 
(open circles). The shift, and the 
resulting asymmetry, are greatly 
exaggerated for readability; in 
reality, the gravitational effect was 
500 times smaller than the width 
of the resonance curve. 
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this is much shorter than the half-life of the excited nuclear state 
emitting the gamma-ray, and therefore the Doppler shift averages 
out to nearly zero. 


Effect (2) is still much bigger than the 10~"4 size of the effect to 
be measured. It can be avoided by exploiting the Mossbauer effect, 
in which a nucleus in a solid substance at low temperature emits or 
absorbs a gamma-ray photon, but with significant probability the 
recoil is taken up not by the individual nucleus but by a vibration 
of the atomic lattice as a whole. Since the recoil energy varies as 
p’ /2m, the large mass of the lattice leads to a very small dissipation 
of energy into the recoiling lattice. Thus if a photon is emitted and 
absorbed by identical nuclei in a solid, and for both emission and 
absorption the recoil momentum is taken up by the lattice as a 
whole, then there is a negligible energy shift. One must pick an 
isotope that emits photons with energies of about 10-100 keV. X- 
rays with energies lower than about 10 keV tend to be absorbed 
strongly by matter and are difficult to detect, whereas for gamma- 
ray energies = 100 keV the Mossbauer effect is not sufficient to 
eliminate the recoil effect completely enough. 


If the Mossbauer effect is carried out in a horizontal plane, reso- 
nant absorption occurs. When the source and absorber are aligned 
vertically, p, gravitational frequency shifts should cause a mismatch, 
destroying the resonance. One can move the source at a small ve- 
locity (typically a few mm/s) in order to add a Doppler shift onto 
the frequency; by determining the velocity that compensates for the 
gravitational effect, one can determine how big the gravitational 
effect is. 


The typical half-life for deexcitation of a nucleus by emission 
of a gamma-ray with energy F is in the nanosecond range. To 
measure an gravitational effect at the 10~'4 level, one would like to 
have a natural line width, (3), with AE/E < 10~“, which would 
require a half-life of = 10 ws. In practice, Pound and Rebka found 
that other effects, such as (4) and electron-nucleus interactions that 
depended on the preparation of the sample, tended to put nuclei 
in one sample “out of tune” with those in another sample at the 
10~'8-10-!? level, so that resonance could not be achieved unless the 
natural line width gave AE/E > 10~!*. As a result, they settled on 
an experiment in which 14 keV gammas were emitted by °’Fe nuclei 
(figure q) at the top of a 22-meter tower, and absorbed by °’Fe 
nuclei at the bottom. The 100-ns half-life of the excited state leads 
to AE/E ~ 10-™. This is 500 times greater than the gravitational 
effect to be measured, so, as described in more detail below, the 
experiment depended on high-precision measurements of small up- 
and-down shifts of the bell-shaped resonance curve. 


The absorbers were seven iron films isotopically enhanced in 
57Fe, applied directly to the faces of seven sodium-iodide scintil- 
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lation detectors (bottom of figure p). When a gamma-ray impinges 
on the absorbers, a number of different things can happen, of which 
we can get away with considering only the following: (a) the gamma- 
ray is resonantly absorbed in one of the °’Fe absorbers, after which 
the excited nucleus decays by re-emission of another such photon (or 
a conversion electron), in a random direction; (b) the gamma-ray 
passes through the absorber and then produces ionization directly 
in the sodium iodide crystal. In case b, the gamma-ray is detected. 
In case a, there is a 50% probability that the re-emitted photon will 
come out in the upward direction, so that it cannot be detected. 
Thus when the conditions are right for resonance, a reduction in 
count rate is expected. The Mossbauer effect never occurs with 
100% probability; in this experiment, about a third of the gammas 
incident on the absorbers were resonantly absorbed. 


The choice of y = 22 m was dictated mainly by systematic er- 
rors. The experiment was limited by the strength of the gamma-ray 
source. For a source of a fixed strength, the count rate in the de- 
tector at a distance y would be proportional to y~?, leading to sta- 
tistical errors proportional to 1//count rate o y. Since the effect 
to be measured is also proportional to y, the signal-to-noise ratio 
was independent of y. However, systematic effects such as (4) were 
easier to monitor and account for when y was fairly large. A lab 
building at Harvard happened to have a 22-meter tower, which was 
used for the experiment. To reduce the absorption of the gammas 
in the 22 meters of air, a long, cylindrical mylar bag full of helium 
gas was placed in the shaft, p. 


The resonance was a bell-shaped curve with a minimum at the 
natural frequency of emission. Since the curve was at a minimum, 
where its derivative was zero, the sensitivity of the count rate to 
the gravitational shift would have been nearly zero if the source had 
been stationary. Therefore it was necessary to vibrate the source 
up and down, so that the emitted photons would be Doppler shifted 
onto the shoulders of the resonance curve, where the slope of the 
curve was large. The resulting asymmetry in count rates is shown 
in figure r. A further effort to cancel out possible systematic effects 
was made by frequently swapping the source and absorber between 
the top and bottom of the tower. 


For y = 22.6 m, the equivalence principle predicts a fractional 
frequency shift due to gravity of 2.46 x 107!°. Pound and Rebka 
measured the shift to be (2.56 + 0.25) x 10~!°. The results were in 
statistical agreement with theory, and verified the predicted size of 
the effect to a precision of 10%. 


s/Pound and Rebka at 
top and bottom of the tower. 
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Problems 


1 In classical mechanics, one hears the term “the acceleration of 
gravity,” which doesn’t literally make sense, since it is objects that 
accelerate. Explain why this term’s usefulness is dependent on the 
equivalence principle. 


2 The New Horizons space probe communicates with the earth 
using microwaves with a frequency of about 10 GHz. Estimate the 
sizes of the following frequency shifts in this signal, when the probe 
flies by Pluto in 2015, at a velocity of ~ 10 A.U./year: (a) the 
Doppler shift due to the probe’s velocity; (b) the Doppler shift due 
to the Earth’s orbital velocity; (c) the gravitational Doppler shift. 


3 Euclid’s axioms E1-E5 (p. 18) do not suffice to prove that 
there are an infinite number of points in the plane, and therefore 
they need to be supplemented by an extra axiom that states this 
(unless one finds the nonstandard realizations with finitely many 
points to be interesting enough to study for their own sake). Prove 
that the axioms of ordered geometry 01-04 on p. 19 do not have 
this problem. > Solution, p. 386 


4 In the science fiction novel Have Spacesuit — Will Travel, by 
Robert Heinlein, Kip, a high school student, answers a radio distress 
call, encounters a flying saucer, and is knocked out and kidnapped 
by aliens. When he wakes up, he finds himself in a locked cell 
with a young girl named Peewee. Peewee claims they’re aboard an 
accelerating spaceship. “If this was a spaceship,” Kip thinks. “The 
floor felt as solid as concrete and motionless.” 


The equivalence principle can be stated in a variety of ways. On 
p. 21, I stated it as (1) gravitational and inertial mass are always 
proportional to one another. An alternative formulation (p. 32) 
is (2) that Kip has no way, by experiments or observarions inside 
his sealed prison cell, to determine whether he’s in an accelerating 
spaceship or on the surface of a planet, experiencing its gravitational 
field. 


(a) Show that any violation of statement 1 also leads to a violation of 
statement 2. (b) If we’d intended to construct a geometrical theory 
of gravity roughly along the lines of axioms O1-O4 on p. 19, which 
axiom is violated in this scenario? > Solution, p. 386 


5 Clock A sits on a desk. Clock B is tossed up in the air from 
the same height as the desk and then comes back down. Compare 
the elapsed times. > Hint, p. 386 > Solution, p. 386 


6 (a) Find the difference in rate between a clock at the center 
of the earth and a clock at the south pole. (b) When an antenna 
on earth receives a radio signal from a space probe that is in a 
hyperbolic orbit in the outer solar system, the signal will show both 
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a kinematic red-shift and a gravitational blueshift. Compare the 
orders of magnitude of these two effects. > Solution, p. 386 


7 Consider the following physical situations: (1) a charged 
object lies on a desk on the planet earth; (2) a charged object orbits 
the earth; (3) a charged object is released above the earth’s surface 
and dropped straight down; (4) a charged object is subjected to a 
constant acceleration by a rocket engine in outer space. In each case, 
we want to know whether the charge radiates. Analyze the physics 
in each case (a) based on conservation of energy; (b) by determining 
whether the object’s motion is inertial in the sense intended by Isaac 
Newton; (c) using the most straightforward interpretation of the 
equivalence principle (i.e., not worrying about the issues discussed 
on p. that surround the ambiguous definition of locality). 
> Solution, p. 387 


8 Consider the physical situation depicted in figure 1, p. 31. 
Let a, be the gravitational acceleration and a, the acceleration of 
the charged particle due to radiation. Then a;/ag measures the vi- 
olation of the equivalence principle. The goal of this problem is to 
make an order-of-magnitude estimate of this ratio in the case of a 
neutron and a proton in low earth orbit. 
(a) Let m the mass of each particle, and q the charge of the charged 
particle. Without doing a full calculation like the ones by the De- 
Witts and Gron and Ness, use general ideas about the frequency- 
scaling of radiation (see section 9.2.5, p. 381) to find the proportion- 
ality that gives the dependence of a;/a, on g, m, and any convenient 
parameters of the orbit. 
(b) Based on considerations of units, insert the necessary universal 
constants into your answer from part a. 
(c) The result from part b will still be off by some unitless factor, but 
we expect this to be of order unity. Under this assumption, make 
an order-of-magnitude estimate of the violation of the equivalence 
principle in the case of a neutron and a proton in low earth orbit. 
> Solution, p. 387 
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Geometry of Flat 
Spacetime 


The geometrical treatment of space, time, and gravity only requires 
as its basis the equivalence of inertial and gravitational mass. Given 
this assumption, we can describe the trajectory of any free-falling 
test particle as a geodesic. Equivalence of inertial and gravitational 
mass holds for Newtonian gravity, so it is indeed possible to redo 
Newtonian gravity as a theory of curved spacetime. This project was 
carried out by the French mathematician Cartan. The geometry of 
the local reference frames is very simple. The three space dimensions 
have an approximately Euclidean geometry, and the time dimension 
is entirely separate from them. This is referred to as a Euclidean 
spacetime with 3+1 dimensions. Although the outlook is radically 
different from Newton’s, all of the predictions of experimental results 
are the same. 


The experiments in section 1.2 show, however, that there are 
real, experimentally verifiable violations of Newton’s laws. In New- 
tonian physics, time is supposed to flow at the same rate everywhere, 
which we have found to be false. The flow of time is actually depen- 
dent on the observer’s state of motion through space, which shows 
that the space and time dimensions are intertwined somehow. The 
geometry of the local frames in relativity therefore must not be as 
simple as Euclidean 3+1. Their actual geometry was implicit in 
Einstein’s 1905 paper on special relativity, and had already been 
developed mathematically, without the full physical interpretation, 
by Hendrik Lorentz. Lorentz’s and Einstein’s work were explicitly 
connected by Minkowski in 1907, so a Lorentz frame is often referred 
to as a Minkowski frame. 


To describe this Lorentz geometry, we need to add more struc- 
ture on top of the axioms 01-04 of ordered geometry, but it will not 
be the additional Euclidean structure of E3-E4, it will be something 
different. To see how to proceed, let’s start by thinking about what 
bare minimum of geometrical machinery is needed in order to set 
up frames of reference. 


a / Hendrik 
(1853-1928) 


Antoon 


Lorentz 
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a/Objects are  released_ at 
rest at spacetime events P and 
Q. They remain at rest, and their 
world-lines define a notion of 
parallelism. 


b/There is no well-defined 
angular measure in this ge- 
ometry. In a different frame of 
reference, the angles are not 
right angles. 


c/Simultaneity is not well 
defined. The constant-time lines 
PQ and RS from figure a are 
not constant-time lines when 
observed in a different frame of 
reference. 


2.1 Affine properties of Lorentz geometry 


2.1.1 Parallelism and measurement 


We think of a frame of reference as a body of measurements 
or possible measurements to be made by some observer. Ordered 
geometry lacks measure. The following argument shows that merely 
by adding a notion of parallelism to our geometry, we automatically 
gain a system of measurement. 


We only expect Lorentz frames to be local, but we do need them 
to be big enough to cover at least some amount of spacetime. If 
Betty does an Edtv6s experiment by releasing a pencil and a lead 
ball side by side, she is essentially trying to release them at the 
same event A, so that she can observe them later and determine 
whether their world-lines stay right on top of one another at point 
B. That was all that was required for the Edtvds experiment, but 
in order to set up a Lorentz frame we need to start dealing with 
objects that are not right on top of one another. Suppose we re- 
lease two lead balls in two different locations, at rest relative to one 
another. This could be the first step toward adding measurement 
to our geometry, since the balls mark two points in space that are 
separated by a certain distance, like two marks on a ruler, or the 
goals at the ends of a soccer field. Although the balls are separated 
by some finite distance, they are still close enough together so that 
if there is a gravitational field in the area, it is very nearly the same 
in both locations, and we expect the distance defined by the gap 
between them to stay the same. Since they are both subject only to 
gravitational forces, their world-lines are by definition straight lines 
(geodesics). The goal here is to end up with some kind of coordi- 
nate grid defining a (t,x) plane, and on such a grid, the two balls’ 
world-lines are vertical lines. If we release them at events P and 
Q, then observe them again later at R and S, PQRS should form a 
rectangle on such a plot. In the figure, the irregularly spaced tick 
marks along the edges of the rectangle are meant to suggest that 
although ordered geometry provides us with a well-defined ordering 
along these lines, we have not yet constructed a complete system of 
measurement. 


The depiction of PQSR as a rectangle, with right angles at its 
vertices, might lead us to believe that our geometry would have 
something like the concept of angular measure referred to in Euclid’s 
E4, equality of right angles. But this is too naive even for the 
Euclidean 3+1 spacetime of Newton and Galileo. Suppose we switch 
to a frame that is moving relative to the first one, so that the balls 
are not at rest. In the Euclidean spacetime, time is absolute, so 
events P and Q would remain simultaneous, and so would R and 
S; the top and bottom edges PQ and RS would remain horizontal 
on the plot, but the balls’ world-lines PR and QS would become 
slanted. The result would be a parallelogram. Since observers in 


42 Chapter 2 Geometry of Flat Spacetime 


different states of motion do not agree on what constitutes a right 
angle, the concept of angular measure is clearly not going to be 
useful here. Similarly, if Euclid had observed that a right angle 
drawn on a piece of paper no longer appeared to be a right angle 
when the paper was turned around, he would never have decided 
that angular measure was important enough to be enshrined in E4. 


In the context of relativity, where time is not absolute, there is 
not even any reason to believe that different observers must agree on 
the simultaneity of PQ and RS. Our observation that time flows dif- 
ferently depending on the observer’s state of motion tells us specifi- 
cally to expect this not to happen when we switch to a frame moving 
to the relative one. Thus in general we expect that PQRS will be 
distorted into a form like the one shown in figure c. We do expect, 
however, that it will remain a parallelogram; a Lorentz frame is one 
in which the gravitational field, if any, is constant, so the properties 
of spacetime are uniform, and by symmetry the new frame should 
still have PR=QS and PQ=RS. 


With this motivation, we form the system of affine geometry by 
adding the following axioms to set 01-04. The notation [PQRS] 
means that events P, Q, S, and R form a parallelogram, and is 
defined as the statement that the lines determined by PQ and RS 
never meet at a point, and similarly for PR and QS. 


Al Constructibility of parallelograms: Given any P, Q, and R, 
there exists S such that [PQRS], and if P, Q, and R are distinct 
then S is unique. 


A2 Symmetric treatment of the sides of a parallelogram: If [PQRS], 
then [QRSP], [QPSR], and [PRQS]. 


A3 Lines parallel to the same line are parallel to one another: If 
[ABCD] and [ABEF], then [CDEF]. 


The following theorem is a stronger version of Playfair’s axiom 
E5, the interpretation being that affine geometry describes a space- 
time that is locally flat. 


Theorem: Given any line @ and any point P not on the line, there 
exists a unique line through P that is parallel to 2. 


This is stronger than E5, which only guarantees uniqueness, not 
existence. Informally, the idea here is that Al guarantees the exis- 
tence of the parallel, and A3 makes it unique.” 


'The axioms are summarized for convenient reference in the back of the book 
on page 412. This formulation is essentially the one given by Penrose, The Road 
to Reality, in section 14.1. 

?Proof: Pick any two distinct points A and B on Z, and construct the uniquely 
determined parallelogram [ABPQ] (axiom Al). Points P and Q determine a line 
(axiom O1), and this line is parallel to @ (definition of the parallelogram). To 


Section 2.1 Affine properties of Lorentz geometry 


d/Construction — of 
parameter. 


an 


affine 


e/Affine geometry gives a 


well-defined centroid 
triangle. 


for 


the 


Although these new axioms do nothing more than to introduce 
the concept of parallelism lacking in ordered geometry, it turns out 
that they also allow us to build up a concept of measurement. Let 
£ be a line, and suppose we want to define a number system on this 
line that measures how far apart events are. Depending on the type 
of line, this could be a measurement of time, of spatial distance, or 
a mixture of the two. First we arbitrarily single out two distinct 
points on @ and label them 0 and 1. Next, pick some auxiliary point 
qo not lying on @. By Al, construct the parallelogram 01qgq,. Next 
construct qglq,2. Continuing in this way, we have a scaffolding of 
parallelograms adjacent to the line, determining an infinite lattice of 
points 1, 2, 3, ...on the line, which represent the positive integers. 
Fractions can be defined in a similar way. For example, 5 is defined 
as the point such that when the initial lattice segment 05 is extended 
by the same construction, the next point on the lattice is 1. 


The continuously varying variable constructed in this way is 
called an affine parameter. The time measured by a free-falling 
clock is an example of an affine parameter, as is the distance mea- 
sured by the tick marks on a free-falling ruler. Since light rays travel 
along geodesics, the wave crests on a light wave can even be used 
analogously to the ruler’s tick marks. 


Centroids Example: 1 
The affine parameter can be used to define the centroid of a set 
of points. In the simplest example, finding the centroid of two 
points, we simply bisect the line segment as described above in 
the construction of the number 4. Similarly, the centroid of a tri- 
angle can be defined as the intersection of its three medians, the 
lines joining each vertex to the midpoint of the opposite side. 


Conservation of momentum Example: 2 
In nonrelativistic mechanics, the concept of the center of mass 
is closely related to the law of conservation of momentum. For 
example, a logically complete statement of the law is that if a sys- 
tem of particles is not subjected to any external force, and we 
pick a frame in which its center of mass is initially at rest, then its 
center of mass remains at rest in that frame. Since centroids are 
well defined in affine geometry, and Lorentz frames have affine 
properties, we have grounds to hope that it might be possible to 
generalize the definition of momentum relativistically so that the 
generalized version is conserved in a Lorentz frame. On the other 
hand, we don’t expect to be able to define anything like a global 


prove that this line is unique, we argue by contradiction. Suppose some other 
parallel m to exist. If m crosses the infinite line BQ at some point Z, then both 
[ABPQ] and [ABPZ], so by Al, Q=Z, so the £ and m are the same. The only 
other possibility is that m is parallel to BQ, but then the following chain of 
parallelisms holds: PQ || AB || m || BQ. By A3, lines parallel to another line are 
parallel to each other, so PQ || BQ, but this is a contradiction, since they have 
Q in common. 
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Lorentz frame for the entire universe, so there is no such natural 
expectation of being able to define a global principle of conser- 
vation of momentum. This is an example of a general fact about 
relativity, which is that conservation laws are difficult or impossible 
to formulate globally. 


Although the affine parameter gives us a system of measurement 
for free in a geometry whose axioms do not even explicitly mention 
measurement, there are some restrictions: 


The affine parameter is defined only along straight lines, i.e., 
geodesics. Alice’s clock defines an affine parameter, but Betty’s 
does not, since it is subject to nongravitational forces. 


We cannot compare distances along two arbitrarily chosen 
lines, only along a single line or two parallel lines. 


The affine parameter is arbitrary not only in the choice of its 
origin 0 (which is to be expected in any case, since any frame 
of reference requires such an arbitrary choice) but also in the 
choice of scale. For example, there is no fundamental way of 
deciding how fast to make a clock tick. 


We will eventually want to lift some of these restrictions by 
adding to our kit a tool called a metric, which allows us to de- 
fine distances along arbitrary curves in space time, and to compare 
distances in different directions. The affine parameter, however, will 
not be entirely superseded. In particular, we’ll find that the metric 
has a couple of properties that are not as nice as those of the affine 
parameter. The square of a metric distance can be negative, and the 
metric distance measured along a light ray is precisely zero, which 
is not very useful. 


Self-check: By the construction of the affine parameter above, 
affine distances on the same line are comparable. By another con- 
struction, verify the claim made above that this can be extended to 
distances measured along two different parallel lines. 


‘Area and volume Example: 3 
It is possible to define area and volume in affine geometry. This 
is a little surprising, since distances along different lines are not 
even comparable. However, we are already accustomed to mul- 
tiplying and dividing numbers that have different units (a concept 
that would have given Euclid conniptions), and the situation in 
affine geometry is really no different. To define area, we extend 
the one-dimensional lattice to two dimensions. Any planar figure 
can be superimposed on such a lattice, and dissected into paral- 
lelograms, each of which has a standard area. 


‘Area on a graph of v versus t Example: 4 
If an object moves at a constant velocity v for time t, the distance 


f/ Example 3. The area of 
the viola can be determined 
by counting the parallelograms 
formed by the lattice. The area 
can be determined to any desired 
precision, by dividing the parallel- 
ograms into fractional parts that 
are as small as necessary. 


g / Example 4. 
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it travels can be represented by the area of a parallelogram in an 
affine plane with sides having lengths v and t. These two lengths 
are measured by affine parameters along two different directions, 
so they are not comparable. For example, it is meaningless to 
ask whether 1 m/s is greater than, less than, or equal to 1 s. If 
we were graphing velocity as a function of time on a conventional 
Cartesian graph, the v and t axes would be perpendicular, but 
affine geometry has no notion of angular measure, so this is irrel- 
evant here. 


Self-check: If multiplication is defined in terms of affine area, 
prove the commutative property ab = ba and the distributive rule 
a(b+c) = ab+ bc from axioms A1-A3. 


2.1.2 Vectors 
Vectors distinguished from scalars 


We’ve been discussing subjects like the center of mass that in 
freshman mechanics would be described in terms of vectors and 
scalars, the distinction being that vectors have a direction in space 
and scalars don’t. As we make the transition to relativity, we are 
forced to refine this distinction. For example, we used to consider 
time as ascalar, but the Hafele-Keating experiment shows that time 
is different in different frames of reference, which isn’t something 
that’s supposed to happen with scalars such as mass or temperature. 
In affine geometry, it doesn’t make much sense to say that a vector 
has a magnitude and direction, since non-parallel magnitudes aren’t 
comparable, and there is no system of angular measurement in which 
to describe a direction. 


A better way of defining vectors and scalars is that scalars are 
absolute, vectors relative. If I have three apples in a bowl, then all 
observers in all frames of reference agree with me on the number 
three. But if my terrier pup pulls on the leash with a certain force 
vector, that vector has to be defined in relation to other things. It 
might be three times the strength of some force that we define as 
one newton, and in the same direction as the earth’s magnetic field. 


In general, measurement means comparing one thing to another. 
The number of apples in the bowl isn’t a measurement, it’s a count. 


Affine measurement of vectors 


Before even getting into the full system of affine geometry, let’s 
consider the one-dimensional example of a line of time. We could 
use the hourly emergence of a mechanical bird from a pendulum- 
driven cuckoo clock to measure the rate at which the earth spins, 
but we could equally well take our planet’s rotation as the standard 
and use it to measure the frequency with which the bird pops out of 
the door. Once we have two things to compare against one another, 
measurement is reduced to counting (figure d, p. 44). Schematically, 
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let’s represent this measurement process with the following notation, 
which is part of a system called called birdtracks:? 


cre = 24 


Here c represents the cuckoo clock and e the rotation of the earth. 
Although the measurement relationship is nearly symmetric, the 
arrow has a direction, because, for example, the measurement of 
the earth’s rotational period in terms of the clock’s frequency is 
c+e = (24 hr)(1 hr~') = 24, but the clock’s period in terms of the 
earth’s frequency is e+c = 1/24. We say that the relationship is 
not symmetric but “dual.” By the way, it doesn’t matter how we 
arrange these diagrams on the page. The notations c+e and e<c 
mean exactly the same thing, and expressions like this can even be 
drawn vertically. 


Suppose that e is a displacement along some one-dimensional 
line of time, and we want to think of it as the thing being measured. 
Then we expect that the measurement process represented by c pro- 
duces a real-valued result and is a linear function of e. Since the 
relationship between c and e is dual, we expect that c also belongs 
to some vector space. For example, vector spaces allow multiplica- 
tion by a scalar: we could double the frequency of the cuckoo clock 
by making the bird come out on the half hour as well as on the 
hour, forming 2c. Measurement should be a linear function of both 
vectors; we say it is “bilinear.” 


Duality 


The two vectors c and e have different units, hr! and hr, and 
inhabit two different one-dimensional vector spaces. The “flavor” of 
the vector is represented by whether the arrow goes into it or comes 
out. Just as we used notation like V@ in freshman physics to tell 
vectors apart from scalars, we can employ arrows in the birdtracks 
notation as part of the notation for the vector, so that instead of 
writing the two vectors as c and e, we can notate them as c+ and 
+e. Performing a measurement is like plumbing. We join the two 
“pipes” in c+ +e and simplify to c+e. 


A confusing and nonstandardized jungle of notation and termi- 
nology has grown up around these concepts. For now, let’s refer to a 
vector such as +e , with the arrow coming in, simply as a “vector,” 
and the type like c> as a “dual vector.” In the one-dimensional 
example of the earth and the cuckoo clock, the roles played by the 
two vectors were completely equivalent, and it didn’t matter which 
one we expressed as a vector and which as a dual vector. Example 
5 shows that it is sometimes more natural to take one quantity as 


3The system used in this book follows the one defined by Cvitanovié, which 
was based closely on a graphical notation due to Penrose. For a more com- 
plete exposition, see the Wikipedia article “Penrose graphical notation” and 
Cvitanovic’s online book at birdtracks.eu. 
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h/1. A displacement vector. 
2. A vector from the space dual 
to the space of displacements. 
3. Measurement is reduced 
to counting. The cuckoo clock 
chimes 24 times in one rotation 
of the earth. 


Rae <1 


yaa 
Z2s' 


é 


i) 


i/Constant-temperature curves 
for January in North America, at 
intervals of 4°C. The tempera- 
ture gradient at a given point is a 
dual vector. 


a vector and another as a dual vector. Example 6 shows that we 
sometimes have no choice at all as to which is which. 


Scaling 


In birdtracks notation, a scalar is a quantity that has no external 
arrows at all. Since the expression c+e = 24 has no external arrows, 
only internal ones, it represents a scalar. This makes sense because 
it’s a count, and a count is a scalar. 


A convenient way of summarizing all of our categories of vari- 
ables is by their behavior when we convert units, i.e., when we rescale 
our space. If we switch our time unit from hours to minutes, the 
number of apples in a bowl is unchanged, the earth’s period of ro- 
tation gets 60 times bigger, and the frequency of the cuckoo clock 
changes by a factor of 1/60. In other words, a quantity wu under 
rescaling of coordinates by a factor ~@ becomes a?u, where the ex- 
ponents —1, 0, and +1 correspond to dual vectors, scalars, and vec- 
tors, respectively. We can therefore see that these distinctions are of 
interest even in one dimension, contrary to what one would have ex- 
pected from the freshman-physics concept of a vector as something 
transforming in a certain way under rotations. 


Geometrical visualization 


In two dimensions, there are natural ways of visualizing the dif- 
ferent vector spaces inhabited by vectors and dual vectors. We’ve 
already been describing a vector like +e as a displacement. Its 
vector space is the space of such displacements.4 A vector in the 
dual space such as c+ can be visualized as a set of parallel, evenly 
spaced lines on a topographic map, h/2, with an arrowhead to show 
which way is “uphill.” The act of measurement consists of counting 
how many of these lines are crossed by a certain vector, h/3. 


Given a scalar field f, its gradient grad f at any given point 
is a dual vector. In birdtracks notation, we have to indicate this 
by writing it with an outward-pointing arrow, (grad f)+. Because 
gradients occur so frequently, we have a special shorthand for them, 
which is simply a circle: 

On 


In the context of spacetime with a metric and curvature, we'll see 
that the usual definition of the gradient in terms of partial deriva- 
tives should be modified with correction terms to form something 
called a covariant derivative. When we get to that point on p. 178, 
we'll commandeer the circle notation for that operation. 


Force is a dual vector Example: 5 
The dot product dW = F .- dx for computing mechanical work 


“In terms of the primitive notions used in the axiomatization in section 2.1, 
a displacement could be described as an equivalence class of segments such that 
for any two segments in the class AB and CD, AB and CD form a parallelogram. 
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becomes, in birdtracks notation, 
dW = Fedx. 


This shows that force is more naturally considered to be a dual 
vector rather than a vector. The symmetry between vectors and 
dual vectors is broken by considering displacements like dx to 
be vectors, and this asymmetry then spreads to other quantities 
such as force. 


The same result can be obtained from Newton’s second law; see 
example 21 on p. 141. 


Systems without a metric Example: 6 
The freshman-mechanics way of thinking about vectors and scalar 
products contains the hidden assumption that we have, besides 
affine measurement, an additional piece of measurement appa- 
ratus called the metric (section 3.5, p. 99). Without yet having to 
formally define what we mean by a metric, we can say roughly 
that it supplies the conveniences that we’re used to having in the 
Euclidean plane, but that are not present in affine geometry. In 
particular, it allows us to define the notion that one vector is per- 
pendicular to another vector, or that one dual vector is perpendic- 
ular to another dual vector. 


Let’s start with an example where the hidden assumption is valid, 
and we do have a metric. Let a billiard ball of unit mass be con- 
strained by a diagonal wall to have C < 0, where C = y — x. The 
Lagrangian formalism just leads to the expected Newtonian ex- 
pressions for the momenta conjugate to x and y, px = X, py =y, 
and these form a dual vector p+. The force of constraint is 


F> = dp+/dt. Let w- = (grad C)- be the gradient of the con- 


straint function. The vectors F- and w- both belong to the 
space of dual vectors, and they are parallel to each other. Since 
we do happen to have a metric in this example, it is also possible 
to say, as most people would, that the force is perpendicular to 
the wall. 


Now consider the example shown in figure j. The arm’s weight is 
negligible compared to the unit mass of the gripped weight, and 
both the upper and lower arm have unit length. Elbows don’t bend 
backward, so we have a constraint C < 0, where C = 0 — o, and 
as before we can define define a dual vector w+ = (grad C)> 
that is parallel to the line of constraint in the (8, ) plane. The 
conjugate momenta (which are actually angular momenta) turn 
out to be py = 6+cos()—0) and a similar expression for py. Asin 
the example of the billiard ball, the force of constraint is parallel to 
w— . There is no metric that naturally applies to the (0, ¢) plane, 
so we have no notion of perpendicularity, and it doesn’t make 
sense to say that F> is perpendicular to the line of constraint. 


j/ There is no_ natural 
on the space (0, ¢). 
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metric 


k/The _ free-falling observer 
considers P and Q to be simulta- 
neous. 


Finally we remark that since four-dimensional Galilean spacetime 
lacks a metric (see p. 101), the distinction between vectors and 
dual vectors in Galilean relativity is a real and physically important 
one. The only reason people were historically able to ignore this 
distinction was that Galilean spacetime splits into independent 
time and spatial parts, with the spatial part having a metric. 


No simultaneity without a metric Example: 7 
We'll see in section 2.2 that one way of defining the distinction be- 
tween Galilean and Lorentz geometry is that in Lorentzian space- 
time, simultaneity is observer-dependent. Without a metric, there 
can be no notion of simultaneity at all, not even a frame-dependent 
one. In figure k, the fact that the observer considers events P 
and Q to be simultaneous is represented by the fact that the 
observer's displacement vector +o is perpendicular to the dis- 
placement +s from P to Q. In affine geometry, we can’t express 
perpendicularity. 


Abstract index notation 


Expressions in birdtracks notation such as 


(C}+s 


can be awkward to type on a computer, which is why we’ve al- 
ready been occasionally resorting to more linear notations such as 
(grad C)+s. As we encounter more complicated birdtracks, the di- 
agrams will sometimes look like complicated electrical schematics, 
and the problem of generating them on a keyboard will get more 
acute. There is in fact a systematic way of representing any such 
expression using only ordinary subscripts and superscripts. This is 
called abstract index notation, and was introduced by Roger Pen- 
rose at around the same time he invented birdtracks. For practical 
reasons, it was the abstract index notation that caught on. 


The idea is as follows. Suppose we wanted to describe a compli- 
cated birdtrack verbally, so that someone else could draw it. The 
diagram would be made up of various smaller parts, a typical one 
looking something like the scalar product u+v. The verbal instruc- 
tions might be: “We have an object u with an arrow coming out of 
it. For reference, let’s label this arrow as a. Now remember that 
other object v I had you draw before? There was an arrow coming 
into that one, which we also labeled a. Now connect up the two 
arrows labeled a.” 


Shortening this lengthy description to its bare minimum, Penrose 
renders it like this: ugv%. Subscripts depict arrows coming out of 
a symbol (think of water flowing from a tank out through a pipe 
below). Superscripts indicate arrows going in. When the same letter 
is used as both a superscript and a subscript, the two arrows are to 


be piped together. 
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Abstract index notation evolved out of an earlier one called 
the Einstein summation convention, in which superscripts and sub- 
scripts referred to specific coordinates. For example, we might take 
0 to be the time coordinate, 1 to be x, and so on. A symbol like uy 
would then indicate a component of the dual vector u, which could 
be its x component if y took on the value 1. Repeated indices were 
summed over. 


The advantage of the birdtrack and abstract index notations is 
that they are coordinate-independent, so that an equation written 
in them is valid regardless of the choice of coordinates. The Einstein 
and Penrose notations look very similar, so for example if we want 
to take a general result expressed in Penrose notation and apply it 
in a specific coordinate system, there is essentially no translation 
required. In fact, the two notations look so similar that we need an 
explicit way to tell which is which, so that we can tell whether or 
not a particular result is coordinate-independent. We therefore use 
the convention that Latin indices represent abstract indices, whereas 
Greek ones imply a specific coordinate system and can take on nu- 
merical values, e.g., y = 1. 


2.2 Relativistic properties of Lorentz geometry 


We now want to pin down the properties of the Lorentz geometry 
that are left unspecified by the affine treatment. We need some 
further input from experiments in order to show us how to proceed. 
We take the following as empirical facts about flat spacetime:° 


L1 Spacetime is homogeneous and isotropic. No time or place 
has special properties that make it distinguishable from other 
points, nor is one direction in space distinguishable from an- 
other.® 


L2 Inertial frames of reference exist. These are frames in which 
particles move at constant velocity if not subject to any forces. 
We can construct such a frame by using a particular particle, 
which is not subject to any forces, as a reference point. 


L3 Equivalence of inertial frames: If a frame is in constant-velocity 
translational motion relative to an inertial frame, then it is also 
an inertial frame. No experiment can distinguish one preferred 
inertial frame from all the others. 


L4 Causality: There exist events 1 and 2 such that t; < tg in all 
frames. 


°These facts are summarized for convenience on page 412 in the back of the 
book. 

°For the experimental evidence on isotropy, see http://www. 
edu- observatory. org/physics-faq/Relativity/SR/experiments.html# 
Tests_of_isotropy_of_space. 
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a/Two objects at rest have 
world-lines that define a _ rect- 
angle. In a second frame of 
reference in motion relative to the 
first one, the rectangle becomes 
a parallelogram. 


L5 Relativity of time: There exist events 1 and 2 and frames of 
reference (t,x) and (t’, x’) such that t) < te, but t, > th. 


L4 makes it possible to have an event 1 that causes an event 2, 
with all observers agreeing on which caused which. L5 is supported 
by the experimental evidence in section 1.2; if L5 were false, then 
space and time could work as imagined by Galileo and Newton. 


Define affine parameters t and x for time and position, and con- 
struct a (t,z) plane. Axiom L1 guarantees that spacetime is flat, 
allowing us to do this; if spacetime had, for example, a curvature 
like that of a sphere, then the axioms of affine geometry would fail, 
and it would be impossible to lay out such a global grid of paral- 
lels. Although affine geometry treats all directions symmetrically, 
we’re going beyond the affine aspects of the space, and t does play 
a different role than x here, as shown, for example, by L4 and L5. 


In the (t,x) plane, consider a rectangle with one corner at the 
origin O. We can imagine its right and left edges as representing the 
world-lines of two objects that are both initially at rest in this frame; 
they remain at rest (L2), so the right and left edges are parallel. 


How do we know that this is a rectangle and not some other kind 
of parallelogram? In purely affine geometry, there is no notion of 
perpendicularity, so this distinction is meaningless. But implicit in 
the existence of inertial frames (L2) is the assumption that spacetime 
has some additional structure that allows a particular observer to 
decide what events he considers to be simultaneous (example 7, 
p. 50). He then considers his own world-line, i.e., his t axis, to 
be perpendicular to a proposed x axis if points on the x axis are 
simultaneous to him. 


We now define a second frame of reference such that the origins 
of the two frames coincide, but they are in motion relative to one 
another with velocity v. The transformation L from the first frame 
to the second is referred to as a Lorentz boost with velocity v. DL 
depends on v. By equivalence of inertial frames (L3), an observer in 
the new frame considers his own t axis to be perpendicular to his own 
x, even though they don’t look that way in figure a. Thus, although 
we assume some notion of perpendicularity, we do not assume that 
it looks the same as the Euclidean one. 


By homogeneity of spacetime (L1), Z must be linear, so the 
original rectangle will be transformed into a parallelogram in the 
new frame; this is also consistent with L3, which requires that the 
world-lines on the right and left edges remain parallel. The left edge 
has inverse slope v. By L5 (no simultaneity), the top and bottom 
edges are no longer horizontal. 


For simplicity, let the original rectangle have unit area. Then 
the area of the new parallelogram is still 1, by the following argu- 
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ment. Let the new area be A, which is a function of v. By isotropy 
of spacetime (L1), A(v) = A(—v). Furthermore, the function A(v) 
must have some universal form for all geometrical figures, not just 
for a figure that is initially a particular rectangle; this follows be- 
cause of our definition of affine area in terms of a dissection by 
a two-dimensional lattice, which we can choose to be a lattice of 
squares. Applying boosts +v and —v one after another results in a 
transformation back into our original frame of reference, and since 
A is universal for all shapes, it doesn’t matter that the second trans- 
formation starts from a parallelogram rather than a square. Scaling 
the area once by A(v) and again by A(—v) must therefore give back 
the original square with its original unit area, A(v)A(—v) = 1, and 
since A(v) = A(—v), A(v) = +1 for any value of v. Since A(0) = 1, 
we must have A(v) = 1 for all v. The argument is independent of 
the shape of the region, so we conclude that all areas are preserved 
by Lorentz boosts. (See subsection 4.6.3 on p. 155 for further inter- 
pretation of A.) 


If we consider a boost by an infinitesimal velocity dv, then the 
vanishing change in area comes from the sum of the areas of the four 
infinitesimally thin slivers where the rectangle lies either outside the 
parallelogram (call this negative area) or inside it (positive). (We 
don’t worry about what happens near the corners, because such 
effects are of order dv?.) In other words, area flows around in the 
x —t plane, and the flows in and out of the rectangle must cancel. 
Let v be positive; the flow at the sides of the rectangle is then to the 
right. The flows through the top and bottom cannot be in opposite 
directions (one up, one down) while maintaining the parallelism of 
the opposite sides, so we have the following three possible cases: 


1 t it 


I There is no flow through the top and bottom. This case cor- 
responds to Galilean relativity, in which the rectangle shears 
horizontally under a boost, and simultaneity is preserved, vi- 
olating L5. 


II Area flows downward at both the top and the bottom. The 
flow is clockwise at both the positive t axis and the positive 
x axis. This makes it plausible that the flow is clockwise ev- 
erywhere in the (t,x) plane, and the proof is straightforward.’ 


"Proof: By linearity of L, the flow is clockwise at the negative axes as well. 


b/ Flows of area: 


(I) a shear 


that preserves simultaneity, (Il) a 
rotation, (Ill) upward flow at all 


edges. 
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c/ Unit square PQRS is Lorentz- 
boosted to the parallelogram 
P’Q'R’S’. 


As v increases, a particular element of area flows continually 
clockwise. This violates L4, because two events with a cause 
and effect relationship could be time-reversed by a Lorentz 
boost. 


III Area flows upward at both the top and the bottom. 


Only case III is possible, and given case III, there must be at least 
one point P in the first quadrant where area flows neither clockwise 
nor counterclockwise.® The boost simply increases P’s distance from 
the origin by some factor. By the linearity of the transformation, 
the entire line running through O and P is simply rescaled. This 
special line’s inverse slope, which has units of velocity, apparently 
has some special significance, so we give it a name, c. We’ll see later 
that c is the maximum speed of cause and effect whose existence 
we inferred in section 1.3. Any world-line with a velocity equal to 
c retains the same velocity as judged by moving observers, and by 
isotropy the same must be true for —c. 


For convenience, let’s adopt time and space units in which c = 1, 
and let the original rectangle be a unit square. The upper right 
tip of the parallelogram must slide along the line through the origin 
with slope +1, and similarly the parallelogram’s other diagonal must 
have a slope of —1. Since these diagonals bisected one another on 
the original square, and since bisection is an affine property that 
is preserved when we change frames of reference, the parallelogram 
must be equilateral. 


We can now determine the complete form of the Lorentz transfor- 
mation. Let unit square PQRS, as described above, be transformed 
to parallelogram P’Q’R’S’ in the new coordinate system (z’,t’). Let 
the t’ coordinate of R’ be y, interpreted as the ratio between the 
time elapsed on a clock moving from P’ to R’ and the corresponding 
time as measured by a clock that is at rest in the (2’,t’) frame. By 
the definition of v, R’ has coordinates (vy,y), and the other geo- 
metrical facts established above place Q’ symmetrically on the other 
side of the diagonal, at (y, vy). Computing the cross product of vec- 
tors P’R’ and P’Q’, we find the area of P’Q’R’S’ to be y?(1 — v”), 
and setting this equal to 1 gives 


1 
‘2 JI —v2 


Also by linearity, the handedness of the flow is the same at all points on a ray 
extending out from the origin in the direction 0. If the flow were counterclockwise 
somewhere, then it would have to switch handedness twice in that quadrant, at 
0, and 02. But by writing out the vector cross product r x dr, where dr is the 
displacement caused by L(dv), we find that it depends on sin(20+ 6), which does 
not oscillate rapidly enough to have two zeroes in the same quadrant. 

’This follows from the fact that, as shown in the preceding footnote, the 
handedness of the flow depends only on @. 
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Self-check: Interpret the dependence of y on the sign of v. 


The result for the transformation L, a Lorentz boost along the 
x axis with velocity v, is: 


t=yt+vyr 
ec =vyt+ yz 


The symmetry of P’Q’R’S’ with respect to reflection across the 
diagonal indicates that the time and space dimensions are treated 
symmetrically, although they are not entirely interchangeable as 
they would have been in case II. 


A measuring rod, unlike a clock, sweeps out a two-dimensional 
strip on an « —t graph. As in Galilean relativity, the two observers 
disagree on the positions of events at the two ends of their rods, 
but in addition they disagree on the simultaneity of such events. 
Calculation shows that a moving rod appears contracted by a factor 
yy. 

In summary, a clock runs fastest according to an observer who 


is at rest relative to the clock, and a measuring rod likewise appears 
longest in its own rest frame. 


The lack of a universal notion of simultaneity has a similarly 
symmetric interpretation. In prerelativistic physics, points in space 
have no fixed identity. A brass plaque commemorating a Civil War 


d/ The behavior of the y factor. 
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Pp! 


e/Example 8. Flashes of 
light travel along P’T’ and Q’'T’. 
The observer in this frame of 
reference judges them to have 
been emitted at different times, 
and to have traveled different 
distances. 


battle is not at the same location as the battle, according to an 
observer who perceives the Earth has having been hurtling through 
space for the intervening centuries. By symmetry, points in time 
have no fixed identity either. 


In everyday life, we don’t notice relativistic effects like time di- 
lation, so apparently y ~ 1, and v < 1, i.e., the speed c must be 
very large when expressed in meters per second. By setting c equal 
to 1, we have chosen a the distance unit that is extremely long in 
proportion to the time unit. This is an example of the correspon- 
dence principle, which states that when a new physical theory, such 
as relativity, replaces an old one, such as Galilean relativity, it must 
remain “backward-compatible” with all the experiments that ver- 
ified the old theory; that is, it must agree with the old theory in 
the appropriate limit. Despite my coyness, you probably know that 
the speed of light is also equal to c. It is important to emphasize, 
however, that light plays no special role in relativity, nor was it 
necessary to assume the constancy of the speed of light in order to 
derive the Lorentz transformation; we will in fact prove on page 67 
that photons must travel at c, and on page 129 that this must be 
true for any massless particle. 


On the other hand, Einstein did originally develop relativity 
based on a different set of assumptions than our L1-L5. His treat- 
ment, given in his 1905 paper “On the electrodynamics of moving 
bodies,” is reproduced on p. ??. It starts from the following two 
postulates: 

P1 The principle of relativity: “...the phenomena of electrody- 
namics as well as of mechanics possess no properties corre- 
sponding to the idea of absolute rest.” 


P2 “... light is always propagated in empty space with a definite 
velocity c which is independent of the state of motion of the 
emitting body.” 


Einstein’s P1 is essentially the same as our L3 (equivalence of in- 
ertial frames). He implicitly assumes something equivalent to our 
L1 (homogeneity and isotropy of spacetime). In his system, our 
L5 (relativity of time) is a theorem proved from the axioms P1-P2, 
whereas in our system, his P2 is a theorem proved from the axioms 
L1-L5. 
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Example: 8 
Let the intersection of the parallelogram’s two diagonals be T in 
the original (rest) frame, and T’ in the Lorentz-boosted frame. An 
observer at T in the original frame simultaneously detects the 
passing by of the two flashes of light emitted at P and Q, and 
since she is positioned at the midpoint of the diagram in space, 
she infers that P and Q were simultaneous. Since the arrival of 
both flashes of light at the same point in spacetime is a concrete 
event, an observer in the Lorentz-boosted frame must agree on 
their simultaneous arrival. (Simultaneity is well defined as long 
as no Spatial separation is involved.) But the distances traveled 
by the two flashes in the boosted frame are unequal, and since 
the speed of light is the same in all cases, the boosted observer 
infers that they were not emitted simultaneously. 


Example: 9 
A different kind of symmetry is the symmetry between observers. 
If observer A says observer B’s time is slow, shouldn’t B say that 
A’s time is fast? This is what would happen if B took a pill that 
slowed down all his thought processes: to him, the rest of the 
world would seem faster than normal. But this can’t be correct 
for Lorentz boosts, because it would introduce an asymmetry be- 
tween observers. There is no preferred, “correct” frame corre- 
sponding to the observer who didn’t take a pill; either observer 
can correctly consider himself to be the one who is at rest. It may 
seem paradoxical that each observer could think that the other 
was the slow one, but the paradox evaporates when we consider 
the methods available to A and B for resolving the controversy. 
They can either (1) send signals back and forth, or (2) get to- 
gether and compare clocks in person. Signaling doesn’t estab- 
lish one observer as correct and one as incorrect, because as 
we'll see in the following section, there is a limit to the speed of 
propagation of signals; either observer ends up being able to ex- 
plain the other observer's observations by taking into account the 
finite and changing time required for signals to propagate. Meet- 
ing in person requires one or both observers to accelerate, as in 
the original story of Alice and Betty, and then we are no longer 
dealing with pure Lorentz frames, which are described by non- 
accelerating observers. 


Einstein's goof Example: 10 
Einstein’s original 1905 paper on special relativity, reproduced on 
p. ??, contains a famous incorrect prediction, that “a spring-clock 
at the equator must go more slowly, by a very small amount, than 
a precisely similar clock situated at one of the poles under other- 
wise identical conditions” (p. ??). This was a reasonable predic- 
tion at the time, but we now know that it was incorrect because 
it neglected gravitational time dilation. In the description of the 
Hafele-Keating experiment using atomic clocks aboard airplanes 
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(p. 15), we saw that both gravity and motion had effects on the 
rate of flow of time. On p. 32 we found based on the equivalence 
principle that the gravitational redshift of an electromagnetic wave 
is AE/E = A® (where c = 1 and 9 is the gravitational potential 
gy), and that this could also be interpreted as a gravitational time 
dilation At/t = A®. 


The clock at the equator suffers a kinematic time dilation that 
would tend to cause it to run more slowly than the one at the 
pole. However, the earth is not a sphere, so the two clocks are 
at different distances from the earth’s center, and the field they 
inhabit is also not the simple field of a sphere. This suggests that 
there may be an additional gravitational effect due to A® + 0. 
Expanding the Lorentz gamma factor in a Taylor series, we find 
that the kinematic effect amounts to At/t = y — 1 = v?/2. The 
mismatch in rates between the two clocks is 

At_ 15 

eo oY — AQ, 
where A® = ®equator — Ppole, and a factor of 1/c? on the right is 
suppressed because c = 1. But this expression for At/t vanishes 
exactly (see below). We therefore find that a change of latitude 
should have no effect on the rate of a clock, provided that it re- 
mains at sea level. 


This has been verified experimentally by Alley et al. Alley’s group 
flew atomic clocks from Washington, DC to Thule, Greenland, left 
them there for four days, and brought them back. The difference 
between the clocks that went to Greenland and other clocks that 
stayed in Washington was 38 + 5 ns, which was consistent with 
the 35+2 ns effect predicted purely based on kinematic and grav- 
itational time dilation while the planes were in the air. If Einstein’s 
1905 prediction had been correct, then there would have been an 
additional difference of 224 ns due to the difference in latitude. 


The perfect cancellation of kinematic and gravitational effects is 
not entirely obvious, and it is easy to get the analysis wrong or 
oversimplify it. It arises because the surface of the earth’s oceans 
is in equilibrium. This is most easily understood in the rotating 
frame, as discussed in example 18 on p. 114. 


GPS Example: 11 
In the GPS system, as in example 10, both gravitational and kine- 
matic time dilation must be considered. Let’s determine the direc- 
tions and relative strengths of the two effects in the case of aGPS 
satellite. 


°C.0. Alley, et al., in NASA Goddard Space Flight Center, Proc. of the 
13th Ann. Precise Time and Time Interval (PTTI) Appl. and Planning 
Meeting, p. 687-724, 1981, available online at http://www.pttimeeting.org/ 
archivemeetings/index9.html 
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A radio photon emitted by a GPS satellite gains energy as it falls 
to the earth’s surface, so its energy and frequency are increased 
by this effect. The observer on the ground, after accounting for 
all non-relativistic effects such as Doppler shifts and the Sagnac 
effect (p. 73), would interpret the frequency shift by saying that 
time aboard the satellite was flowing more quickly than on the 
ground. 


However, the satellite is also moving at orbital speeds, so there is 
a Lorentz time dilation effect. According to the observer on earth, 
this causes time aboard the satellite to flow more slowly than on 
the ground. 


We can therefore see that the two effects are of opposite sign. 
Which is stronger? 


For a satellite in low earth orbit, we would have v2 /r = g, where 
r is only slightly greater than the radius of the earth. The relative 
effect on the flow of time is y—1 = v*/2 = gr/2. The gravi- 
tational effect, approximating g as a constant, is —gy, where y 
is the satellite’s altitude above the earth. For such a satellite, the 
gravitational effect is down by a factor of 2y/r, so the Lorentz time 
dilation dominates. 


GPS satellites, however, are not in low earth orbit. They orbit 
at an altitude of about 20,200 km, which is quite a bit greater 
than the radius of the earth. We therefore expect the gravitational 
effect to dominate. To confirm this, we need to generalize the 
equation At/t = A® (with c = 1) from example 10 to the case 
where g is not a constant. Integrating the equation dt/t = d®, 
we find that the time dilation factor is equal to e“®. When A@ is 
small, e4? ~ 1+ AQ, and we have a relative effect equal to A®. 
The total effect fora GPS satellite is thus (inserting factors of c for 
calculation with SI units, and using positive signs for blueshifts) 

1 v2 ~10 —10 

2 ( ao- 5) = 5.2% 107°" =0.9'* 107", 

Cc 2 
where the first term is gravitational and the second kinematic. A 
more detailed analysis includes various time-varying effects, but 
this is the constant part. For this reason, the atomic clocks aboard 
the satellites are set to a frequency of 10.22999999543 MHz be- 
fore launching them into orbit; on the average, this is perceived 
on the ground as 10.23 MHz. A more complete analysis of the 
general relativity involved in the GPS system can be found in the 
review article by Ashby.!° 


Self-check: Suppose that positioning a clock at a certain dis- 
tance from a certain planet produces a fractional change 6 in the 


1ON, Ashby, “Relativity in the Global Positioning System,” http://www. 
livingreviews.org/1lrr-2003-1 
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f/ Apparatus used for the test of 
relativistic time dilation described 
in example 12. The promi- 
nent black and white blocks are 
large magnets surrounding a cir- 
cular pipe with a vacuum inside. 
(c) 1974 by CERN. 


rate at which time flows. In other words, the time dilation factor 
is 1+ 6. Now suppose that a second, identical planet is brought 
into the picture, at an equal distance from the clock. The clock is 
positioned on the line joining the two planets’ centers, so that the 
gravitational field it experiences is zero. Is the fractional time dila- 
tion now approximately 0, or approximately 26? Why is this only 
an approximation? 
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Large time dilation Example: 12 
The time dilation effect in the Hafele-Keating experiment was very 
small. If we want to see a large time dilation effect, we can’t do 
it with something the size of the atomic clocks they used; the 
kinetic energy would be greater than the total megatonnage of 
all the world’s nuclear arsenals. We can, however, accelerate 
subatomic particles to speeds at which y is large. An early, low- 
precision experiment of this kind was performed by Rossi and Hall 
in 1941, using naturally occurring cosmic rays. Figure f shows a 
1974 experiment!! of a similar type which verified the time dila- 
tion predicted by relativity to a precision of about one part per 
thousand. 


Muons were produced by an accelerator at CERN, near Geneva. 
A muon is essentially a heavier version of the electron. Muons un- 
dergo radioactive decay, lasting an average of only 2.197 us be- 
fore they evaporate into an electron and two neutrinos. The 1974 
experiment was actually built in order to measure the magnetic 
properties of muons, but it produced a high-precision test of time 
dilation as a byproduct. Because muons have the same electric 
charge as electrons, they can be trapped using magnetic fields. 
Muons were injected into the ring shown in figure f, circling around 
it until they underwent radioactive decay. At the speed at which 
these muons were traveling, they had y = 29.33, so on the av- 
erage they lasted 29.33 times longer than the normal lifetime. In 
other words, they were like tiny alarm clocks that self-destructed 
at a randomly selected time. Figure g shows the number of ra- 
dioactive decays counted, as a function of the time elapsed af- 
ter a given stream of muons was injected into the storage ring. 
The two dashed lines show the rates of decay predicted with and 
without relativity. The relativistic line is the one that agrees with 
experiment. 


Time dilation in the Pound-Rebka experiment Example: 13 
In the description of the Pound-Rebka experiment on page 34, | 
postponed the quantitative estimation of the frequency shift due 
to temperature. Classically, one expects only a broadening of 
the line, since the Doppler shift is proportional to v)/c, where 
v, the component of the emitting atom’s velocity along the line 
of sight, averages to zero. But relativity tells us to expect that if 
the emitting atom is moving, its time will flow more slowly, so the 
frequency of the light it emits will also be systematically shifted 
downward. This frequency shift should increase with tempera- 
ture. In other words, the Pound-Rebka experiment was designed 
as a test of general relativity (the equivalence principle), but this 
special-relativistic effect is just as strong as the relativistic one, 
and needed to be accounted for carefully. 


"Bailey at al., Nucl. Phys. B150(1979) 1 
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g /Muons accelerated to nearly c 
undergo radioactive decay much 
more slowly than they would 
according to an observer at rest 
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h/The change in the frequency 
of x-ray photons emitted by °’Fe 
as a function of temperature, 
drawn after Pound And Rebka 
(1960). Dots are experimental 
measurements. The solid curve 
is Pound and Rebka’s theoretical 
calculation using the Debye the- 
ory of the lattice vibrations with 
a Debye temperature of 420 de- 
grees C. The dashed line is one 
with the slope calculated in the 
text using a simplified treatment 
of the thermodynamics. There is 
an arbitrary vertical offset in the 
experimental data, as well as the 
theoretical curves. 
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In Pound and Rebka’s paper describing their experiment, they 
refer to a preliminary measurement’? in which they carefully mea- 
sured this effect, showed that it was consistent with theory, and 
pointed out that a previous claim by Cranshaw et al. of having 
measured the gravitational frequency shift was vitiated by their 
failure to control for the temperature dependence. 


It turns out that the full Debye treatment of the lattice vibrations is 
not really necessary near room temperature, so we'll simplify the 
thermodynamics. At absolute temperature 7, the mean transla- 
tional kinetic energy of each iron nucleus is (3/2)k T. The velocity 
is much less than c(= 1), so we can use the nonrelativistic expres- 
sion for kinetic energy, K = (1/2)mv?, which gives a mean value 
for v? of 3kT/m. In the limit of v < 1, time dilation produces a 
change in frequency by a factor of 1/y, which differs from unity 
by approximately —v?/2. The relative time dilation is therefore 
~3kT/2m, or, in metric units, —-3kT/2mc?. The vertical scale 
in figure h contains an arbitrary offset, since Pound and Rebka’s 
measurements were the best absolute measurements to date of 
the frequency. The predicted slope of —3k/2mc?, however, is 
not arbitrary. Plugging in 57 atomic mass units for m, we find 
the slope to be 2.4 x 10~'®, which, as shown in the figure is an 
excellent approximation (off by only 10%) near room temperature. 


2.2.1 Geodesics and stationary action 


One way of characterizing geodesics in spacetime is by using 
an action principle. This is similar to characterizing a geodesic in 
Euclidean space as a line of minimum length between two points. 
For a timelike geodesic from event P to event Q, we have a proper 
time Tt. In Lorentz spacetime, this proper time is greater than it 
would have been for any non-geodesic motion from P to Q. In curved 
spacetime, we must weaken this statement somewhat. The proper 
time may not be a global maximum, but it is stationary. Stationarity 
means that if we vary the curve by some small amount, not moving 
any part of it by a coordinate distance greater than ¢, then the 
change in T is of order e€?. 


For spacelike geodesics in Lorentz spacetime, the proper length 
is stationary, but small deformations of the curve can either increase 
or decrease the proper length. The stationary action approach does 
not work well for lightlike geodesics. 


Phys. Rev. Lett. 4 (1960) 337 
Phys. Rev. Lett. 4 (1960) 274 
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2.3 The light cone 


Given an event P, we can now classify all the causal relationships in 
which P can participate. In Newtonian physics, these relationships 
fell into two classes: P could potentially cause any event that lay in 
its future, and could have been caused by any event in its past. Ina 
Lorentz spacetime, we have a trichotomy rather than a dichotomy. 
There is a third class of events that are too far away from P in space, 
and too close in time, to allow any cause and effect relationship, since 
causality’s maximum velocity is c. Since we’re working in units in 
which c = 1, the boundary of this set is formed by the lines with 
slope +1 on a (t,x) plot. This is referred to as the light cone, and in 
the generalization from 1+1 to 3+1 dimensions, it literally becomes 
a (four-dimensional) cone. The terminology comes from the fact 
that light happens to travel at c, the maximum speed of cause and 
effect. If we make a cut through the cone defined by a surface of con- 
stant time in P’s future, the resulting section is a sphere (analogous 
to the circle formed by cutting a three-dimensional cone), and this 
sphere is interpreted as the set of events on which P could have had 
a causal effect by radiating a light pulse outward in all directions. 


Events lying inside one another’s light cones are said to have 
a timelike relationship. Events outside each other’s light cones are 
spacelike in relation to one another, and in the case where they lie 
on the surfaces of each other’s light cones the term is lightlike. 


The light cone plays the same role in the Lorentz geometry that 
the circle plays in Euclidean geometry. The truth or falsehood of 
propositions in Euclidean geometry remains the same regardless of 
how we rotate the figures, and this is expressed by Euclid’s E3 as- 
serting the existence of circles, which remain invariant under rota- 
tion. Similarly, Lorentz boosts preserve light cones and truth of 
propositions in a Lorentz frame. 


Self-check: Under what circumstances is the time-ordering of 
events P and Q preserved under a Lorentz boost? 


In a uniform Lorentz spacetime, all the light cones line up like 
soldiers with their axes parallel with one another. When gravity is 
present, however, this uniformity is disturbed in the vicinity of the 
masses that constitute the sources. The light cones lying near the 
sources tip toward the sources. Superimposed on top of this gravi- 
tational tipping together, recent observations have demonstrated a 
systematic tipping-apart effect which becomes significant on cosmo- 
logical distance scales. The parameter A that sets the strength of 
this effect is known as the cosmological constant. The cosmologi- 
cal constant is not related to the presence of any sources (such as 
negative masses), and can be interpreted instead as a tendency for 
space to expand over time on its own initiative. In the present era, 
the cosmological constant has overpowered the gravitation of the 
universe’s mass, causing the expansion of the universe to accelerate. 


t ¥ spacelike 


y, <_ timelike “> 


x 


a/The light cone in 2+1 di- 
mensions. 


b/The circle plays a_ privi- 
leged role in Euclidean geometry. 
When rotated, it stays the same. 
The pie slice is not invariant as 
the circle is. A similar privileged 
place is occupied by the light 
cone in Lorentz geometry. Under 
a Lorentz boost, the spacetime 
parallelograms change, but the 
light cone doesn’t. 
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Self-check: In the bottom panel of figure c, can an observer look 
at the properties of the spacetime in her immediate vicinity and 
tell how much her light cones are tipping, and in which direction? 
Compare with figure j on page 28. 


c/Light cones tip over for 
two reasons in general relativity: 
because of the presence of 
masses, which have gravita- 
tional fields, and because of 
the cosmological constant. The 
time and distance scales in the 
bottom figure are many orders of 
magnitude greater than those in 
the top. 


d/Example 14. Matter is 
lifted out of a Newtonian black 
hole with a bucket. The dashed 
line represents the point at which 
the escape velocity equals the 
speed of light. 


A Newtonian black hole Example: 14 

In the case of a black hole, the light cone tips over so far that 
the entire future timelike region lies within the black hole. If an 
observer is present at such an event, then that observer’s en- 
tire potential future lies within the black hole, not outside it. By 
expanding on the logical consequences of this statement, we ar- 
rive at an example of relativity’s proper interpretation as a theory 
of causality, not a theory of objects exerting forces on one an- 
other as in Newton's vision of action at a distance, or Lorentz’s 
original ether-drag interpretation of the factor y, in which length 
contraction arose from a physical strain imposed on the atoms 
composing a physical body. 


Imagine a black hole from a Newtonian point of view, as proposed 
in 1783 by geologist John Michell. Setting the escape velocity 
equal to the speed of light, we find that this will occur for any grav- 
itating spherical body compact enough to have M/r > c?/2G. 
(A fully relativistic argument, as given in section 6.2, agrees on 
M/r «x c?#/G, which is fixed by units. The correct unitless factor 
depends on the definition of r, which is flexible in general relativ- 
ity.) A flash of light emitted from the surface of such a Newtonian 
black hole would fall back down like water from a fountain, but 
it would nevertheless be possible for physical objects to escape, 
e.g., if they were lifted out in a bucket dangling from a cable. If the 
cable is to support its own weight, it must have a tensile strength 
per unit density of at least c?/2, which is about ten orders of mag- 
nitude greater than that of carbon nanotube fibers. (The factor of 
1/2 is not to be taken seriously, since it comes from a nonrela- 
tivistic calculation.) 


The cause-and-effect interpretation of relativity tells us that this 
Newtonian picture is incorrect. A physical object that approaches 
to within a distance r of a concentration of mass M, with M/r 
sufficiently large, has no causal future lying at larger values of r. 
The conclusion is that there is a limit on the tensile strength of 
any substance, imposed purely by general relativity, and we can 
state this limit without having to know anything about the physical 
nature of the interatomic forces. A more complete treatment of 
the tension in the rope is given in example 5 on p. 305. Cf. also 
homework problem 4 and section 3.5.4, as well as some refer- 
ences given in the remark following problem 4. 
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2.3.1 Velocity addition 


In classical physics, velocities add in relative motion. For exam- 
ple, if a boat moves relative to a river, and the river moves relative 
to the land, then the boat’s velocity relative to the land is found 
by vector addition. This linear behavior cannot hold relativistically. 
For example, if a spaceship is moving at 0.60c relative to the earth, 
and it launches a probe at 0.60c relative to itself, we can’t have the 
probe moving at 1.20c relative to the earth, because this would be 


greater than the maximum speed of cause and effect, c. To see how 5 h 

to add velocities relativistically, we start by rewriting the Lorentz 

transformation as the matrix 2 
1 


coshn sinhn 
sinhy coshyn /)’ 


-1 
where 7 = tanh! v is called the rapidity. We are guaranteed that -2 
the matrix can be written in this form, because its area-preserving 3 
property says that the determinant equals 1, and cosh? 7— sinh? 7 = 
1 is an identity of the hyperbolic trig functions. It is now straight- e/ The rapidity, n = tanh~'v, as 
forward to verify that multiplication of two matrices of this form a function of v. 


gives a third matrix that is also of this form, with 7 = 1 + y2. In 
other words, rapidities add linearly; velocities don’t. In the example 
of the spaceship and the probe, the rapidities add as tanh! .60 + 
tanh~! .60 = .693 + .693 = 1.386, giving the probe a velocity of 
tanh 1.386 = 0.88 relative to the earth. Any number of velocities 
can be added in this way, 7, +2 +...+n- 


Self-check: Interpret the asymptotes of the graph in figure e. 


Bell's spaceship paradox Example: 15 

A difficult philosophical question is whether the time dilation and 

length contractions predicted by relativity are “real.” This de- 

pends, of course, on what one means by “real.” They are frame- 

dependent, i.e., observers in different frames of reference dis- 

agree about them. But this doesn’t tell us much about their reality, 

since velocities are frame-dependent in Newtonian mechanics, 

but nobody worries about whether velocities are real. John Bell 

(1928-1990) proposed the following thought experiment to physi- 

cists in the CERN cafeteria, and found that nearly all of them got A 
it wrong. He took this as evidence that their intuitions had been 
misguided by the standard way of approaching this question of 
the reality of Lorentz contractions. 


j 


Let spaceships A and B accelerate as shown in figure f along a 
straight line. Observer C does not accelerate. The accelerations, 
as judged by C, are constant, and equal for the two ships. Each 
ship is equipped with a yard-arm, and a thread is tied between 
the two arms. Does the thread break, due to Lorentz contraction? 
(We assume that the acceleration is gentle enough that the thread 
does not break simply because of its own inertia.) 


f / Example 15. 
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The popular answer in the CERN cafeteria was that the thread 
would not break, the reasoning being that Lorentz contraction is 
a frame-dependent effect, and no such contraction would be ob- 
served in A and B’s frame. The ships maintain a constant dis- 
tance from one another, so C merely disagrees with A and B 
about the length of the thread, as well as other lengths like the 
lengths of the spaceships. 


The error in this reasoning is that the accelerations of A and B 
were specified to be equal and constant in C’s frame, not in A and 
B’s. Bell's interpretation is that the frame-dependence is a distrac- 
tion, that Lorentz contraction is in some sense a real effect, and 
that it is therefore immediately clear that the thread must break, 
without even having to bother going into any other frame. To con- 
vince his peers in the cafeteria, however, Bell presumably needed 
to satisfy them as to the specific errors in their reasoning, and this 
requires that we consider the frame-dependence explicitly. 


We can first see that it is impossible, in general, for different ob- 
servers to agree about what is meant by constant acceleration. 
Suppose that A and B agree with C about the constancy of their 
acceleration. Then A and B experience a voyage in which the 
rapidities of the stars around them (and of observer C) increase 
linearly with time. As the rapidity approaches infinity, both C and 
the stars approach the speed of light. But since A and C agree on 
the magnitude of their velocity relative to one another, this means 
that A’s velocity as measured by C must approach c, and this 
contradicts the premise that C observes constant acceleration for 
both ships. Therefore A and B do not consider their own acceler- 
ations to be constant. 


A and B do not agree with C about simultaneity, and since they 
also do not agree that their accelerations are constant, they do 
not consider their own accelerations to be equal at a given mo- 
ment of time. Therefore the string changes its length, and this 
is consistent with Bell’s original, simple answer, which did not re- 
quire comparing different frames of reference. To establish that 
the string comes under tension, rather than going slack, we can 
apply the equivalence principle. By the equivalence principle, any 
experiments done by A and B give the same results as if they 
were immersed in a gravitational field. The leading ship B sees A 
as experiencing a gravitational time dilation. According to B, the 
slowpoke A isn’t accelerating as rapidly as it should, causing the 
string to break. 


These ideas are closely related to the fact that general relativity 
does not admit any spacetime that can be interpreted as a uni- 
form gravitational field (see problem 7, p. 209). 
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2.3.2 Logic 


The trichotomous classification of causal relationships has in- 
teresting logical implications. In classical Aristotelian logic, every 
proposition is either true or false, but not both, and given proposi- 
tions p and q, we can form propositions such as p A q (both p and 
q) or pV q (either p or q). Propositions about physical phenomena 
can only be verified by observation. Let p be the statement that 
a certain observation carried out at event P gives a certain result, 
and similarly for q at Q. If PQ is spacelike, then the truth or false- 
hood of pA q cannot be checked by physically traveling to P and 
Q, because no observer would be able to attend both events. The 
truth-value of p/Aq is unknown to any observer in the universe until 
a certain time, at which the relevant information has been able to 
propagate back and forth. What if P and Q lie inside two different 
black holes? Then the truth-value of pA q can never be determined 
by any observer. Another example is the case in which P and Q 
are separated by such a great distance that, due to the accelerating 
expansion of the universe, their future light cones do not overlap. 


We conclude that Aristotelian logic cannot be appropriately ap- 
plied to relativistic observation in this way. Some workers attempt- 
ing to construct a quantum-mechanical theory of gravity have sug- 
gested an even more radically observer-dependent logic, in which 
different observers may contradict one another on the truth-value of 
a single proposition p;, unless they agree in advance on the list po, 
p3, ...of all the other propositions that they intend to test as well. 
We'll return to these questions on page 252. 


2.4 Experimental tests of Lorentz geometry 


We’ve already seen, in section 1.2, a variety of evidence for the non- 
classical behavior of spacetime. We’re now in a position to discuss 
tests of relativity more quantitatively. An up-to-date review of such 
tests is given by Mattingly.'4 


One such test is that relativity requires the speed of light to 
be the same in all frames of reference, for the following reasons. 
Compare with the speed of sound in air. The speed of sound is not 
the same in all frames of reference, because the wave propagates 
at a fixed speed relative to the air. An observer at who is moving 
relative to the air will measure a different speed of sound. Light, on 
the other hand, isn’t a vibration of any physical medium. Maxwell’s 
equations predict a definite value for the speed of light, regardless 
of the motion of the source. This speed also can’t be relative to 
any medium. If the speed of light isn’t fixed relative to the source, 
and isn’t fixed relative to a medium, then it must be fixed relative 
to anything at all. The only speed in relativity that is equal in all 


4] ivingreviews.org/lrr-2005-5 
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a/An artist's conception of a 
gamma-ray burst, resulting from 
a supernova explosion. 


frames of reference is c, so light must propagate at c. We will see 
on page 129 that there is a deeper reason for this; relativity requires 
that any massless particle propagate at c. The requirement of v = c 
for massless particles is so intimately hard-wired into the structure 
of relativity that any violation of it, no matter how tiny, would be of 
great interest. Essentially, such a violation would disprove Lorentz 
invariance, i.e., the invariance of the laws of physics under Lorentz 
transformations. There are two types of tests we could do: (1) 
test whether photons of all energies travel at the same speed, i.e., 
whether the vacuum is dispersive; (2) test whether observers in all 
frames of reference measure the same speed of light. 


2.4.1 Dispersion of the vacuum 


Some candidate quantum-mechanical theories of gravity, such 
as loop quantum gravity, predict a granular structure for spacetime 
at the Planck scale, ,/hG/c? = 10~-®° m, which one could imagine 
might lead to deviations from v = 1 that would become more and 
more significant for photons with wavelengths getting closer and 
closer to that scale. Lorentz-invariance would then be an approxi- 
mation valid only at large scales. It turns out that the state of the 
art in loop quantum gravity is not yet sufficient to say whether or 
not such an effect should exist. 


Presently the best experimental tests of the invariance of the 
speed of light with respect to wavelength come from astronomical 
observations of gamma-ray bursts, which are sudden outpourings of 
high-energy photons, believed to originate from a supernova explo- 
sion in another galaxy. One such observation, in 2009,!° collected 
photons from such a burst, with a duration of 2 seconds, indicating 
that the propagation time of all the photons differed by no more 
than 2 seconds out of a total time in flight on the order of ten bil- 
lion years, or about one part in 10!"! A single superlative photon in 
the burst had an energy of 31 GeV, and its arrival within the same 
2-second time window demonstrates Lorentz invariance over a vast 
range of photon energies, contradicting heuristic estimates that had 
been made by some researchers in loop quantum gravity. 


2.4.2 Observer-independence of c 


The constancy of the speed of light for observers in all frames of 
reference was originally detected in 1887 when Michelson and Morley 
set up a clever apparatus to measure any difference in the speed of 
light beams traveling east-west and north-south. The motion of 
the earth around the sun at 110,000 km/hour (about 0.01% of the 
speed of light) is to our west during the day. Michelson and Morley 
believed that light was a vibration of a physical medium, the ether, 
so they expected that the speed of light would be a fixed value 
relative to the ether. As the earth moved through the ether, they 


nttp://arxiv.org/abs/0908. 1832 
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thought they would observe an effect on the velocity of light along 
an east-west line. For instance, if they released a beam of light in 
a westward direction during the day, they expected that it would 
move away from them at less than the normal speed because the 
earth was chasing it through the ether. They were surprised when 
they found that the expected 0.01% change in the speed of light did 
not occur. 
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Although the Michelson-Morley experiment was nearly two dec- 
ades in the past by the time Einstein published his first paper on 


relativity in 1905, and Einstein did know about it,!° it’s unclear how 


much it influenced him. Michelson and Morley themselves were un- 
certain about whether the result was to be trusted, or whether sys- 
tematic and random errors were masking a real effect from the ether. 
There were a variety of competing theories, each of which could 
claim some support from the shaky data. Some physicists believed 
that the ether could be dragged along by matter moving through it, 
which inspired variations on the experiment that were conducted on 
mountaintops in thin-walled buildings, (figure), or with one arm of 
the apparatus out in the open, and the other surrounded by massive 
lead walls. In the standard sanitized textbook version of the history 
of science, every scientist does his experiments without any pre- 
conceived notions about the truth, and any disagreement is quickly 
settled by a definitive experiment. In reality, this period of confu- 


'6J. van Dongen, http://arxiv.org/abs/0908.1545 


b/ The Michelson-Morley experi- 
ment, shown in photographs, and 
drawings from the original 1887 
paper. 1. A simplified draw- 
ing of the apparatus. A beam of 
light from the source, s, is par- 
tially reflected and partially trans- 
mitted by the half-silvered mirror 
h;. The two half-intensity parts of 
the beam are reflected by the mir- 
rors at a and b, reunited, and ob- 
served in the telescope, t. If the 
earth’s surface was supposed to 
be moving through the ether, then 
the times taken by the two light 
waves to pass through the mov- 
ing ether would be unequal, and 
the resulting time lag would be 
detectable by observing the inter- 
ference between the waves when 
they were reunited. 2. In the real 
apparatus, the light beams were 
reflected multiple times. The ef- 
fective length of each arm was 
increased to 11 meters, which 
greatly improved its sensitivity to 
the small expected difference in 
the speed of light. 3. In an 
earlier version of the experiment, 
they had run into problems with 
its “extreme sensitiveness to vi- 
bration,” which was “so great that 
it was impossible to see the in- 
terference fringes except at brief 
intervals ...even at two o’clock 
in the morning.” They therefore 
mounted the whole thing on a 
massive stone floating in a pool of 
mercury, which also made it pos- 
sible to rotate it easily. 4. A photo 
of the apparatus. Note that it is 
underground, in a room with solid 
brick walls. 
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sion about the Michelson-Morley experiment lasted for four decades, 
and a few reputable skeptics, including Miller, continued to believe 
that Einstein was wrong, and kept trying different variations of the 
experiment as late as the 1920’s. Most of the remaining doubters 
were convinced by an extremely precise version of the experiment 
performed by Joos in 1930, although you can still find kooks on the 
internet who insist that Miller was right, and that there was a vast 
conspiracy to cover up his results. 


c/Dayton Miller thought that the result of the Michelson-Morley ex- 
periment could be explained because the ether had been pulled along by 
the dirt, and the walls of the laboratory. This motivated him to carry out a 
series of experiments at the top of Mount Wilson, in a building with thin 
walls. 


Before Einstein, some physicists who did believe the negative 
result of the Michelson-Morley experiment came up with explana- 
tions that preserved the ether. In the period from 1889 to 1895, both 
Lorentz and George FitzGerald suggested that the negative result 
of the Michelson-Morley experiment could be explained if the earth, 
and every physical object on its surface, was contracted slightly by 
the strain of the earth’s motion through the ether. Thus although 
Lorentz developed all the mathematics of Lorentz frames, and got 
them named after himself, he got the interpretation wrong. 


2.4.3 Lorentz violation by gravitational forces 


The tests described in sections 2.4.1 and 2.4.2 both involve the 
behavior of light, i.e., they test whether or not electromagnetism 
really has the exact Lorentz-invariant behavior contained implicitly 
in Maxwell’s equations. In the jargon of the field, they test Lorentz 
invariance in the “photon sector.” Since relativity is a theory of 
gravity, it is natural to ask whether the Lorentz invariance holds 
for gravitational forces as well as electromagnetic ones. If Lorentz 
invariance is violated by gravity, then the strength of gravitational 
forces might depend on the observer’s motion through space, rela- 
tive to some fixed reference frame analogous to that of the ether. 
Historically, gravitational Lorentz violations have been much more 
difficult to test, since gravitational forces are so weak, and the first 
high-precision data were obtained by Nordtvedt and Will in 1957, 
70 years after Michelson and Morley. Nordtvedt and Will measured 
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the strength of the earth’s gravitational field as a function of time, 
and found that it did not vary on a 24-hour cycle with the earth’s 
rotation, once tidal effects had been accounted for. Further con- 
straints come from data on the moon’s orbit obtained by reflecting 
laser beams from a mirror left behind by the Apollo astronauts. 


A recent high-precision laboratory experiment was done in 2009 
by Chung et al.!? They constructed an interferometer in a verti- 
cal plane that is conceptually similar to a Michelson interferometer, 
except that it uses cesium atoms rather than photons. That is, 
the light waves of the Michelson-Morley experiment are replaced by 
quantum-mechanical matter waves. The roles of the half-silvered 
and fully silvered mirrors are filled by lasers, which kick the atoms 
electromagnetically. Each atom’s wavefunction is split into two 
parts, which travel by two different paths through spacetime, even- 
tually reuniting and interfering. The result is a measurement of g 
to about one part per billion. The results, shown in figure d, put a 
strict limit on violations of Lorentz geometry by gravity. 


2.5 Three spatial dimensions 

New and nontrivial phenomena arise when we generalize from 1+1 
dimensions to 3+1. 

2.5.1 Lorentz boosts in three dimensions 


How does a Lorentz boost along one axis, say x, affect the other 
two spatial coordinates y and z? 


First, we can rule out the possibility that such a transformation 
could have various terms such as t/ = ...+(...)y+.... For example, 


arxiv.org/abs/0905.1929 


d/The results of the measure- 
ment of g by Chung et al., sec- 
tion 2.4.3. The experiment was 
done on the Stanford University 
campus, surrounded by the Pa- 
cific ocean and San Francisco 
Bay, so it was subject to vary- 
ing gravitational from both astro- 
nomical bodies and the rising and 
falling ocean tides. Once both of 
these effects are subtracted out 
of the data, there is no Lorentz- 
violating variation in g due to 
the earth’s motion through space. 
Note that the data are broken up 
into three periods, with gaps of 
three months and four years sep- 
arating them. (c) APS, used un- 
der the U.S. fair use exception to 
copyright. 


height 


time 


e/The matter interferometer 
used by Chung et al. Each atom’s 
wavefunction is split into two 
parts, which travel along two 
different paths (solid and dashed 
lines). 
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a/A boost along x followed 
by a boost along y results in 
tangling up of the x and y coordi- 
nates, so the result is not just a 
boost but a boost plus a rotation. 


if the t coefficient was positive for v > 0, then the laws of physics 
would be different from the laws that applied in a universe where 
the y or ¢ axis was inverted, but this would violate parity or time- 
reversal symmetry. This establishes that observers in the two frames 
agree on the directions of the y and z axes and on simultaneity along 
those axes when they coincide. 


Now suppose that two observers, in motion relative to one an- 
other along the x axis, each carry a stick, represented by line seg- 
ments AB and CD, oriented along the y axis, such that the bases 
of the sticks A and C coincide at some time. Due to the vanishing 
of the types of terms in the transformation referred to above, they 
agree that B and D are collinear with A (and C) at this time. Then 
by O83 and O4, either B lies between A and D, D lies between A and 
B, or B=D. That is, they must agree whether the sticks are equal 
in length or, if not, then on whose is longer. This would violate L1, 
isotropy of space, since it would distinguish +z from —z. 


Another simple way to obtain this result is as follows. We have 
already proved that area in the (t,x) plane is preserved. The same 
proof applies to volume in the spaces (t,x,y) and (t,x, z), hence 
lengths in the y and z directions are preserved. (The proof does not 
apply to volume in, e.g., (x, y, z) space, because the x transformation 
depends on ¢, and therefore if we are given a region in (x,y, 2), we 
do not have enough information to say how it will change under a 
Lorentz boost.) 


The complete form of the transformation L(vx), a Lorentz boost 
along the x axis with velocity v, is therefore: 


/ 


t= yt+vyx 
a =vyt+yr 
y=y 
=e 


Based on the trivial nature of this generalization, it might seem 
as though no qualitatively new considerations would arise in 3+1 
dimensions as compared with 1+1. To see that this is not the case, 
consider figure a. A boost along the x axis tangles up the x and 
t coordinates. A y-boost mingles y and t. Therefore consecutive 
boosts along x and y can cause x and y to mix. The result, as 
we’ll see in more detail below, is that two consecutive boosts along 
non-collinear axes are not equivalent to a single boost; they are 
equivalent to a boost plus a spatial rotation. The remainder of this 
section discusses this effect, known as Thomas precession, in more 
detail; it can be omitted on a first reading. 


Self-check: Apply similar reasoning to a Galilean boost. 
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2.5.2 Gyroscopes and the equivalence principle 


To see how this mathematical fact would play out as a physical 
effect, we need to consider how to make a physical manifestation of 
the concept of a direction in space. 


In two space dimensions, we can construct a ring laser, b/1, 
which in its simplest incarnation is a closed loop of optical fiber 
with a bidirectional laser inserted in one place. Coherent light tra- 
verses the loop simultaneously in both directions, interfering in a 
beat pattern, which can be observed by sampling the light at some 
point along the loop’s circumference. If the loop is rotated in its 
own plane, the interference pattern is altered, because the beam- 
sampling device is in a different place, and the path lengths traveled 
by the two beams has been altered. This phase shift is called the 
Sagnac effect, after M. Georges Sagnac, who observed the effect in 
1913 and interpreted it, incorrectly, as evidence for the existence of 
the aether.'® The loop senses its own angular velocity relative to 
an inertial reference frame. If we transport the loop while always 
carefully adjusting its orientation so as to prevent phase shifts, then 
its orientation has been preserved. The atomic clocks used in the 
Hafele-Keating atomic-clock experiment described on page 15 were 
subject to the Sagnac effect. 


In three spatial dimensions, we could build a spherical cavity 
with a reflective inner surface, and release a photon inside, b/2. 


In reality, the photon-in-a-cavity is not very practical. The pho- 
ton would eventually be absorbed or scattered, and it would also be 
difficult to accurately initialize the device and read it out later. A 
more practical tool is a gyroscope. For example, one of the classic 
tests of general relativity is the 2007 Gravity Probe B experiment 
(discussed in detail on pages 170 and 224), in which four gyro- 
scopes aboard a satellite were observed to precess due to special- 
and general-relativistic effects. 


The gyroscope, however, is not so obviously a literal implementa- 
tion of our basic concept of a direction. How, then, can we be sure 
that its behavior is equivalent to that of the photon-in-a-cavity? 
We could, for example, carry out a complete mathematical develop- 
ment of the angular momentum vector in relativity.'? The equiva- 
lence principle, however, allows us to bypass such technical details. 
Suppose that we seal the two devices inside black boxes, with iden- 
tical external control panels for initializing them and reading them 
out. We initialize them identically, and then transport them along 
side-by-side world-lines. Nonrelativistically, both the mechanical 
gyroscope and the photon-gyroscope would maintain absolute, fixed 
directions in space. Relativistically, they will not necessarily main- 


'8Comptes rendus de l’Académie des science 157 (1913) 708 
This is done, for example, in Misner, Thorne, and Wheeler, Gravitation, pp. 
157-159. 


bidirectional 
1 laser 


2 intensity 
sampler 


b/Inertial devices for main- 
taining a direction in space: 1. 
A ring laser. 2. The photon in 
a perfectly reflective spherical 
cavity. 3. A gyroscope. 


c/A_ ring laser 
built for use in inertial guidance of 
aircraft. 


gyroscope 
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d/Nonrelativistically, the gy- 
roscope should not rotate as long 
as the forces from the hammer 
are all transmitted to it at its 
center of mass. 


tain their orientations. For example, we’ve already seen in section 
2.5.1 that there are reasons to expect that their orientations will 
change if they are subjected to accelerations that are not all along 
the same line. Because relativity is a geometrical theory of space- 
time, this difference between the classical and relativistic behavior 
must be determinable from purely geometrical considerations, such 
as the shape of the world-line. If it depended on something else, 
then we could conceivably see a disagreement in the outputs of the 
two instruments, but this would violate the equivalence principle. 


Suppose there were such a discrepancy. That discrepancy would 
be a physically measurable property of the spacetime region through 
which the two gyroscopes had been transported. The effect would 
have a certain magnitude and direction, so by collecting enough data 
we could map it out as vector field covering that region of spacetime. 
This field evidently causes material particles to accelerate, since it 
has an effect on the mechanical gyroscope. Roughly speaking (the 
reasoning will be filled in more rigorously on page 142), the fact 
that this field acts differently on the two gyroscopes is like getting a 
non-null result from an Edtv6s experiment, and it therefore violates 
the equivalence principle. We conclude that gyroscopes b/2 and 
b/3 are equivalent. In other words, there can only be one uniquely 
defined notion of direction, and the details of how it is implemented 
are irrelevant. 


2.5.3 Boosts causing rotations 


As a quantitative example, consider the following thought ex- 
periment. Put a gyroscope in a box, and send the box around the 
square path shown in figure d at constant speed. The gyroscope de- 
fines a local coordinate system, which according to classical physics 
would maintain its orientation. At each corner of the square, the 
box has its velocity vector changed abruptly, as represented by the 
hammer. We assume that the hits with the hammer are transmitted 
to the gyroscope at its center of mass, so that they do not result 
in any torque. Nonrelativistically, if the set of gyroscopes travels 
once around the square, it should end up at the same place and 
in the same orientation, so that the coordinate system it defines is 
identical with the original one. 


For notation, let L(vx) indicate the boost along the x axis de- 
scribed by the transformation on page 71. This is a transformation 
that changes to a frame of reference moving in the negative x direc- 
tion compared to the original frame. A particle considered to be at 
rest in the original frame is described in the new frame as moving 
in the positive x direction. Applying such an L to a vector p, we 
calculate Lp, which gives the coordinates of the event as measured 
in the new frame. An expression like MLp is equivalent by asso- 
ciativity to M(Lp), ie., ML represents applying L first, and then 
M. 
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In this notation, the hammer strikes can be represented by a 
series of four Lorentz boosts, 


T = L(vx) L(vy) L(—vx) L(-vy), 


where we assume that the square has negligible size, so that all four 
Lorentz boosts act in a way that preserves the origin of the coordi- 
nate systems. (We have no convenient way in our notation L(...) to 
describe a transformation that does not preserve the origin.) The 
first transformation, L(—vy), changes coordinates measured by the 
original gyroscope-defined frame to new coordinates measured by 
the new gyroscope-defined frame, after the box has been acceler- 
ated in the positive y direction. 


e/ A page from one of Einstein’s 
notebooks. 
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The calculation of T is messy, and to be honest, I made a series 
of mistakes when I tried to crank it out by hand. Calculations in 
relativity have a reputation for being like this. Figure e shows a page 
from one of Einstein’s notebooks, written in fountain pen around 
1913. At the bottom of the page, he wrote “zu umstaendlich,” 
meaning “too involved.” Luckily we live in an era in which this sort 
of thing can be handled by computers. Starting at this point in the 
book, I will take appropriate opportunities to demonstrate how to 
use the free and open-source computer algebra system Maxima to 
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keep complicated calculations manageable. The following Maxima 
program calculates a particular element of the matrix T. 


1 /* For convenience, define gamma in terms of v: */ 
2  gamma:1/sqrt (1-v*v) ; 

3 /* Define Lx as L(x-hat), Lmx as L(-x-hat), etc.: */ 
4 Lx:matrix([gamma, gammaxv, 0], 

5 [gamma*v, gamma, 0], 

6 (0, G; a3 

7 Ly:matrix(([gamma, 0, gammaxv] , 

8 [0, i, Ol, 

9 [gamma*v, 0, gamma] ) ; 

10 Lmx:matrix([gamma, -gamma*v, 0], 

11 [-gamma*v, gamma, oO], 

12 [0, 0, 1133 

13. Lmy:matrix([gamma, 0, -gamma*v] , 

14 [0, 1, Ol, 

15 [-gammatv, 0, gamma] ) ; 

16 /* Calculate the product of the four matrices: */ 
17) «T:Lx.Ly.Lmx.Lmy; 

18 /* Define a column vector along the x axis: */ 

19 P:matrix([0],[1],[0]); 

20 /* Find the result of T acting on this vector, 

21 expressed as a Taylor series to second order in v: */ 
22 taylor(T.P,v,0,2); 


Statements are terminated by semicolons, and comments are writ- 
ten like /* ... */ On line 2, we see a symbolic definition of the 
symbol gamma in terms of the symbol v. The colon means “is de- 
fined as.” Line 2 does not mean, as it would in most programming 
languages, to take a stored numerical value of v and use it to cal- 
culate a numerical value of y. In fact, v does not have a numerical 
value defined at this point, nor will it ever have a numerical value 
defined for it throughout this program. Line 2 simply means that 
whenever Maxima encounters the symbol gamma, it should take it as 
an abbreviation for the symbol 1/sqrt(1-v*v). Lines 5-16 define 
some 3 x 3 matrices that represent the L transformations. The basis 
is t, x, y. Line 18 calculates the product of the four matrices; the 
dots represent matrix multiplication. Line 23 defines a vector along 
the x axis, expressed as a column matrix (three rows of one column 
each) so that Maxima will know how to operate on it using matrix 
multiplication by T. 


Finally line 26 outputs”? the result of T acting on P: 


19 [bE O#o eee 


207> 


ve omitted some output generated automatically from the earlier steps in 
the computation. The (%09) indicates that this is Maxima’s output from the 
ninth and final step. 
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the initial frame of reference, before T' is applied, to determine that 
a particular reference point, such as a distant star, is along the x 
axis. Applying T,, we get a new vector TP, which we find has a non- 
vanishing y component approximately equal to —v?. This result is 
entirely unexpected classically. It tells us that the gyroscope, rather 
than maintaining its original orientation as it would have done clas- 
sically, has rotated slightly. It has precessed in the counterclockwise 
direction in the x—y plane, so that the direction to the star, as mea- 
sured in the coordinate system defined by the gyroscope, appears 
to have rotated clockwise. As the box moved clockwise around the 
square, the gyroscope has apparently rotated by a counterclockwise 
angle y = v? about the z axis. We can see that this is a purely 
relativistic effect, since for v < 1 the effect is small. For historical 
reasons discussed in section 2.5.4, this phenomenon is referred to as 
the Thomas precession. 


The particular features of this square geometry are not necessary. 
I chose them so that (1) the boosts would be along the Cartesian 
axes, so that we would be able to write them down easily; (2) it is 
clear that the effect doesn’t arise from any asymmetric treatment 
of the spatial axes; and (3) the change in the orientation of the 
gyroscope can be measured at the same point in space, e.g., by 
comparing it with a twin gyroscope that stays at home. In general: 


A gyroscope transported around a closed loop in flat space- 
time changes its orientation compared with one that is not 
accelerated. 


This is a purely relativistic effect, since a Newtonian gyro- 
scope does not change its axis of rotation unless subjected to 
a torque; if the boosts are accomplished by forces that act at 
the gyroscope’s center of mass, then there is no nonrelativistic 
explanation for the effect. 


The effect can occur in the absence of any gravitational fields. 
That is, this is a phenomenon of special relativity. 
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f / The velocity disk. 


g/Two excursions in a rocket- 
ship: one along the y axis and 
one along x. 


The composition of two or more Lorentz boosts along different 
axes is not equivalent to a single boost; it is equivalent to a 
boost plus a spatial rotation. 


Lorentz boosts do not commute, i.e., it makes a difference what 
order we perform them in. Even if there is almost no time lag 
between the first boost and the second, the order of the boosts 
matters. If we had applied the boosts in the opposite order, 
the handedness of the effect would have been reversed. 


Self-check: If Lorentz boosts did commute, what would be the 
consequences for the expression L(vx) L(vy) L(—vx) L(—vy)? 


The velocity disk 


Figure f shows a useful way of visualizing the combined effects 
of boosts and rotations in 2+1 dimensions. The disk depicts all 
possible states of motion relative to some arbitrarily chosen frame 
of reference. Lack of motion is represented by the point at the 
center. A point at distance v from the center represents motion at 
velocity v in a particular direction in the x — y plane. By drawing 
little axes at a particular point, we can represent a particular frame 
of reference: the frame is in motion at some velocity, with its own 
x and y axes are oriented in a particular way. 


It turns out to be easier to understand the qualitative behavior 
of our mysterious rotations if we switch from the low-velocity limit 
to the contrary limit of ultrarelativistic velocities. Suppose we have 
a rocket-ship with an inertial navigation system consisting of two 
gyroscopes at right angles to one another. We first accelerate the 
ship in the y direction, and the acceleration is steady in the sense 
that it feels constant to observers aboard the ship. Since it is rapidi- 
ties, not velocities, that add linearly, this means that as an observer 
aboard the ship reads clock times 7, T2, ..., all separated by equal 
intervals Av, the ship’s rapidity changes at a constant rate, m1, 2, 
.... This results in a series of frames of reference that appear closer 
and closer together on the diagram as the ship approaches the speed 
of light, at the edge of the disk. We can start over from the center 
again and repeat the whole process along the x axis, resulting in 
a similar succession of frames. In both cases, the boosts are being 
applied along a single line, so that there is no rotation of the x and 
y axes. 


Now suppose that the ship were to accelerate along a route like 
the one shown in figure h. It first accelerates along the y axis at a 
constant rate (again, as judged by its own sensors), until its velocity 
is very close to the speed of light, A. It then accelerates, again at 
a self-perceived constant rate and with thrust in a fixed direction 
as judged by its own gyroscopes, until it is moving at the same 
ultrarelativistic speed in the x direction, B. Finally, it decelerates 
in the x direction until it is again at rest, O. This motion traces out 
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a clockwise loop on the velocity disk. The motion in space is also 
clockwise. 


We might naively think that the middle leg of the trip, from A 
to B, would be a straight line on the velocity disk, but this can’t be 
the case. First, we know that non-collinear boosts cause rotations. 
Traveling around a clockwise path causes counterclockwise rotation, 
and vice-versa. Therefore an observer in the rest frame O sees the 
ship (and its gyroscopes) as rotating as it moves from A to B. The 
ship’s trajectory through space is clockwise, so according to O the 
ship rotates counterclockwise as it goes A to B. The ship is always 
firing its engines in a fixed direction as judged by its gyroscopes, but 
according to O the ship is rotating counterclockwise, its thrust is 
progressively rotating counterclockwise, and therefore its trajectory 
turns counterclockwise. We conclude that leg AB on the velocity 
disk is concave, rather than being a straight-line hypotenuse of a 
triangle OAB. 


We can also determine, by the following argument, that leg AB 
is perpendicular to the edge of the disk where it touches the edge of 
the disk. In the transformation from frame A to frame O, y coordi- 
nates are dilated by a factor of y, which approaches infinity in the 
limit we’re presently considering. Observers aboard the rocket-ship, 
occupying frame A, believe that their task is to fire the rocket’s 
engines at an angle of 45 degrees with respect to the y axis, so as 
to eliminate their velocity with respect to the origin, and simulta- 
neously add an equal amount of velocity in the x direction. This 
45-degree angle in frame A, however, is not a 45-degree angle in 
frame O. From the stern of the ship to its bow we have displace- 
ments Az and Ay, and in the transformation from A to O, Ay 
is magnified almost infinitely. As perceived in frame O, the ship’s 
orientation is almost exactly antiparallel to the y axis.?! 


As the ship travels from A to B, its orientation (as judged in 
frame O) changes from —y to x. This establishes, in a much more 
direct fashion, the direction of the Thomas precession: its handed- 
ness is contrary to the handedness of the direction of motion. We 
can now also see something new about the fundamental reason for 
the effect. It has to do with the fact that observers in different 
states of motion disagree on spatial angles. Similarly, imagine that 
you are a two-dimensional being who was told about the existence 
of a new, third, spatial dimension. You have always believed that 
the cosine of the angle between two unit vectors u and v is given by 


21 Although we will not need any more than this for the purposes of our present 
analysis, a longer and more detailed discussion by Rhodes and Semon, www. 
bates.edu/~msemon/RhodesSemonFinal.pdf, Am. J. Phys. 72(7)2004, shows 
that this type of inertially guided, constant-thrust motion is always represented 
on the velocity disk by an arc of a circle that is perpendicular to the disk at its 
edge. (We consider a diameter of the disk to be the limiting case of a circle with 
infinite radius.) 


oO B 


h/A_ round-trip involving — ul- 
trarelativistic velocities. All three 
legs are at constant acceleration. 


Oo B 


i/In the limit where A and B 
are ultrarelativistic velocities, leg 
AB is perpendicular to the edge 
of the velocity disk. The result is 
that the x — y frame determined 
by the ship’s gyroscopes has 
rotated by 90 degrees by the time 
it gets home. 
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the vector dot product uzvz + Uyvy. If you were allowed to explore 
a two-dimensional projection of a three-dimensional scene, e.g., on 
the flat screen of a television, it would seem to you as if all the 
angles had been distorted. You would have no way to interpret the 
visual conventions of perspective. But once you had learned about 
the existence of a z axis, you would realize that these angular dis- 
tortions were happening because of rotations out of the x — y plane. 
Such rotations really conserve the quantity u,vyz + Uyvy + Uzvz; only 
because you were ignoring the u,v, term did it seem that angles 
were not being preserved. Similarly, the generalization from three 
Euclidean spatial dimensions to 3+1-dimensional spacetime means 
that three-dimensional dot products are no longer conserved. 


The general low-v limit 


Let’s find the low-v limit of the Thomas precession in general, 
not just in the highly artificial special case of y  v? for the example 
involving the four hammer hits. To generalize to the case of smooth 
acceleration, we first note that the rate of precession dy/dt must 
have the following properties. 


It is odd under a reversal of the direction of motion, v > —v. 
(This corresponds to sending the gyroscope around the square 
in the opposite direction.) 


It is odd under a reversal of the acceleration due to the second 
boost, a> —a. 


It is a rotation about the spatial axis perpendicular to the 
plane of the v and a vectors, in the opposite direction com- 
pared to the handedness of the curving trajectory. 


It is approximately linear in v and a, for small v and a. 


The only rotationally invariant mathematical operation that has 
these symmetry properties is the vector cross product, so the rate 
of precession must be ka x v, where k > 0 is nearly independent of 
v and a for small v and a. 


To pin down the value of k, we need to find a connection be- 
tween our two results: y ~ v? for the four hammer hits, and 
dy/dt + kaxv for smooth acceleration. We can do this by consider- 
ing the physical significance of areas on the velocity disk. As shown 
in figure j, the rotation y due to carrying the velocity around the 
boundary of a region is additive when adjacent regions are joined to- 
gether. We can therefore find y for any region by breaking the region 
down into elements of area dA and integrating their contributions 
dy. What is the relationship between dA and dy? The velocity 
disk’s structure is nonuniform, in the sense that near the edge of 
the disk, it takes a larger boost to move a small distance. But we’re 
investigating the low-velocity limit, and in the low-velocity region 
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near the center of the disk, the disk’s structure is approximately 
uniform. We therefore expect that there is an approximately con- 
stant proportionality factor relating dA and dy at low velocities. 
The example of the hammer corresponds geometrically to a square 
with area v7, so we find that this proportionality factor is unity, 
dA & dy. 


To relate this to smooth acceleration, consider a particle per- 
forming circular motion with period 7, which has |a x v| = 27v?/T. 
Over one full period of the motion, we have x = f kla x v|dt = 
2nkv?, and the particle’s velocity vector traces a circle of area A = 
mv” on the velocity disk. Equating A and x, we find k = 1/2. The 
result is that in the limit of low velocities, the rate of rotation is 


1 
Pg Aw 


where 1°1 is the angular velocity vector of the rotation. In the special 
case of circular motion, this can be written as Q = (1/2)v?w, where 
w = 2n/T is the angular frequency of the motion. 


2.5.4 An experimental test: Thomas precession in hydrogen 


If we want to see this precession effect in real life, we should look 
for a system in which both v and a are large. An atom is such a 
system. 


The Bohr model, introduced in 1913, marked the first quantita- 
tively successful, if conceptually muddled, description of the atomic 
energy levels of hydrogen. Continuing to take c = 1, the over-all 
scale of the energies was calculated to be proportional to ma?, where 
m is the mass of the electron, and a = ke?/h = 1/137, known as the 
fine structure constant, is essentially just a unitless way of express- 
ing the coupling constant for electrical forces. At higher resolution, 
each excited energy level is found to be split into several sub-levels. 
The transitions among these close-lying states are in the millime- 
ter region of the microwave spectrum. The energy scale of this fine 
structure is ~ ma‘. This is down by a factor of a? compared to the 
visible-light transitions, hence the name of the constant. Uhlenbeck 
and Goudsmit showed in 1926 that a splitting on this order of mag- 
nitude was to be expected due to the magnetic interaction between 
the proton and the electron’s magnetic moment, oriented along its 
spin. The effect they calculated, however, was too big by a factor 
of two. 


The explanation of the mysterious factor of two had in fact been 
implicit in a 1916 calculation by Willem de Sitter, one of the first 
applications of general relativity. De Sitter treated the earth-moon 
system as a gyroscope, and found the precession of its axis of rota- 
tion, which was partly due to the curvature of spacetime and partly 
due to the type of rotation described earlier in this section. The 
effect on the motion of the moon was noncumulative, and was only 


j/lf the crack between the 
two areas is squashed flat, the 
two pieces of the path on the 
interior coincide, and their contri- 
butions to the precession cancel 
out (v — —v, but a > +a, so 
axv—-— -—axv). Therefore the 
precession x obtained by going 
around the outside is equal to the 
sum x1 + X2 of the precessions 
that would have been obtained by 
going around the two parts. 


~maz 
magnetic 
relativistic :/ 


ground state 


k/ States in hydrogen are la- 
beled with their 2 and s quantum 
numbers, representing _ their 
orbital and spin angular momenta 
in units of h. The state with 
s = +1/2 has its spin angular 
momentum aligned with its orbital 
angular momentum, while the 
s = -—1/2 state has the two 
angular momenta in opposite 
directions. The direction and 
order of magnitude of the splitting 
between the two @ = 1 states 
is successfully explained by 
magnetic interactions with the 
proton, but the calculated effect 
is too big by a factor of 2. The 
relativistic Thomas _ precession 
cancels out half of the effect. 
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about one meter, which was much too small to be measured at the 
time. In 1927, however, Llewellyn Thomas applied similar reason- 
ing to the hydrogen atom, with the electron’s spin vector playing 
the role of gyroscope. Since gravity is negligible here, the effect has 
nothing to do with curvature of spacetime, and Thomas’s effect cor- 
responds purely to the special-relativistic part of de Sitter’s result. 
It is simply the rotation described above, with Q = (1/2)v2w. Al 
though Thomas was not the first to calculate it, the effect is known 
as Thomas precession. Since the electron’s spin is h/2, the energy 
splitting is +(f/2)Q, depending on whether the electron’s spin is in 
the same direction as its orbital motion, or in the opposite direc- 
tion. This is less than the atom’s gross energy scale hw by a factor 
of v?/2, which is ~ a?. The Thomas precession cancels out half of 
the magnetic effect, bringing theory in agreement with experiment. 


Uhlenbeck later recalled: “...when I first heard about [the Thomas 
precession], it seemed unbelievable that a relativistic effect could 
give a factor of 2 instead of something of order u/c... Even the 
cognoscenti of relativity theory (Einstein included!) were quite sur- 
prised.” 
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Problems 


1 Suppose that we don’t yet know the exact form of the Lorentz 
transformation, but we know based on the Michelson-Morley exper- 
iment that the speed of light is the same in all inertial frames, and 
we’ve already determined, e.g., by arguments like those on p. 71, 
that there can be no length contraction in the direction perpendic- 
ular to the motion. We construct a “light clock,” consisting simply 
of two mirrors facing each other, with a light pulse bouncing back 
and forth between them. 

(a) Suppose this light clock is moving at a constant velocity v in the 
direction perpendicular to its own optical arm, which is of length LD. 
Use the Pythagorean theorem to prove that the clock experiences a 
time dilation given by y = 1/1 — v2, thereby fixing the time-time 
portion of the Lorentz transformation. 

(b) Why is it significant for the interpretation of special relativity 
that the result from part a is independent of L? 

(c) Carry out a similar calculation in the case where the clock moves 
with constant acceleration a as measured in some inertial frame. Al- 
though the result depends on L, prove that in the limit of small L, 
we recover the earlier constant-velocity result, with no explicit de- 
pendence on a. 


Remark: Some authors state a “clock postulate” for special relativity, which 
says that for a clock that is sufficiently small, the rate at which it runs de- 
pends only on v, not a (except in the trivial sense that v and a are related 
by calculus). The result of part c shows that the clock “postulate” is really a 
theorem, not a statement that is logically independent of the other postulates 
of special relativity. Although this argument only applies to a particular fam- 
ily of light clocks of various sizes, one can also make any small clock into an 
acceleration-insensitive clock, by attaching an accelerometer to it and apply- 
ing an appropriate correction to compensate for the clock’s observed sensitivity 
to acceleration. (It’s still necessary for the clock to be small, since otherwise 
the lack of simultaneity in relativity makes it impossible to describe the whole 
clock as having a certain acceleration at a certain instant.) Farley at al.?? have 
verified the “clock postulate” to within 2% for the radioactive decay of muons 
with 7 ~ 12 being accelerated by magnetic fields at 5 x 10'S m/s”. Some peo- 
ple get confused by this acceleration-independent property of small clocks and 
think that it contradicts the equivalence principle. For a good explanation, see 
http://math.ucr.edu/home/baez/physics/Relativity/SR/clock.html. 
> Solution, p. 388 


?2Nuovo Cimento 45 (1966) 281 


Problems 


83 


time difference (ns) 


June July 


A graph from the paper by 
lijima, showing the time differ- 
ence between the two clocks. 
One clock was kept at Mitaka 
Observatory, at 58 m above sea 
level. The other was moved 
back and forth between a second 
observatory, Norikura Corona 
Station, and the peak of the 
Norikura volcano, 2876 m above 
sea level. The plateaus on the 
graph are data from the periods 
when the clocks were compared 
side by side at Mitaka. The 
difference between one plateau 
and the next is the gravitational 
time dilation accumulated during 
the period when the mobile clock 
was at the top of Norikura. 


2 Some of the most conceptually direct tests of relativistic time 
dilation were carried out by comparing the rates of twin atomic 
clocks, one left on a mountaintop for a certain amount of time, the 
other in a nearby valley below.?? Unlike the clocks in the Hafele- 
Keating experiment, these are stationary for almost the entire dura- 
tion of the experiment, so any time dilation is purely gravitational, 
not kinematic. One could object, however, that the clocks are not 
really at rest relative to one another, due to the earth’s rotation. 
This is an example of how the distinction between gravitational 
and kinematic time dilations is frame-dependent, since the effect is 
purely gravitational in the rotating frame, where the gravitational 
field is reduced by the fictitious centrifugal force. Show that, in the 
non-rotating frame, the ratio of the kinematic effect to the gravi- 
tational one comes out to be 2.8 x 107% at the latitude of Tokyo. 
This small value indicates that the experiment can be interpreted 
as a very pure test of the gravitational time dilation effect. To cal- 
culate the effect, you will need to use the fact that, as discussed 
on p. 33, gravitational redshifts can be interpreted as gravitational 
time dilations. > Solution, p. 388 


3 (a) On p. 81 (see figure j), we showed that the Thomas pre- 
cession is proportional to area on the velocity disk. Use a similar 
argument to show that the Sagnac effect (p. 73) is proportional to 
the area enclosed by the loop. 

(b) Verify this more directly in the special case of a circular loop. 
(c) Show that a light clock of the type described in problem 1 is 
insensitive to rotation with constant angular velocity. 

(d) Connect these results to the commutativity and transitivity as- 
sumptions in the Einstein clock synchronization procedure described 


on p. ??. > Solution, p. 388 


4 Example 14 on page 64 discusses relativistic bounds on the 
properties of matter, using the example of pulling a bucket out of a 
black hole. Derive a similar bound by considering the possibility of 
sending signals out of the black hole using longitudinal vibrations of 
a cable, as in the child’s telephone made of two tin cans connected 
by a piece of string. 

Remark: Surprisingly subtle issues can arise in such calculations; see A.Y. 
Shiekh, Can. J. Phys. 70, 458 (1992). For a quantitative treatment of a dangling 
rope in relativity, see Greg Egan, “The Rindler Horizon,” http://gregegan. 
customer .netspace.net.au/SCIENCE/Rindler/RindlerHorizon.html. 

5 The Maxima program on page 76 demonstrates how to mul- 
tiply matrices and find Taylor series. Apply this technique to the 
following problem. For successive Lorentz boosts along the same 


31. Briatore and 8. Leschiutta, “Evidence for the earth gravitational shift by 
direct atomic-time-scale comparison,” I] Nuovo Cimento B, 37B (2): 219 (1977). 
lijima et al., “An experiment for the potential blue shift at the Norikura Corona 
Station,” Annals of the Tokyo Astronomical Observatory, Second Series, Vol. 
XVI, 2 (1978) 68. 
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axis with rapidities 7, and 7, find the matrix representing the com- 

bined Lorentz transformation, in a Taylor series up to the first non- 

classical terms in each matrix element. A mixed Taylor series in 

two variables can be obtained simply by nesting taylor functions. 

The taylor function will happily work on matrices, not just scalars. 
> Solution, p. 388 
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Chapter 3 
Differential Geometry 


General relativity is described mathematically in the language of 
differential geometry. Let’s take those two terms in reverse order. 


The geometry of spacetime is non-Euclidean, not just in the 
sense that the 3+1-dimensional geometry of Lorentz frames is dif- 
ferent than that of 4 interchangeable Euclidean dimensions, but also 
in the sense that parallels do not behave in the way described by 
E5 or Al-A3. In a Lorentz frame, which describes space without 
any gravitational fields, particles whose world-lines are initially par- 
allel will continue along their parallel world-lines forever. But in 
the presence of gravitational fields, initially parallel world-lines of 
free-falling particles will in general diverge, approach, or even cross. 
Thus, neither the existence nor the uniqueness of parallels can be 
assumed. We can’t describe this lack of parallelism as arising from 
the curvature of the world-lines, because we’re using the world-lines 
of free-falling particles as our definition of a “straight” line. Instead, 
we describe the effect as coming from the curvature of spacetime it- 
self. The Lorentzian geometry is a description of the case in which 
this curvature is negligible. 


What about the word differential? The equivalence principle 
states that even in the presence of gravitational fields, local Lorentz 
frames exist. How local is “local?” If we use a microscope to zoom in 
on smaller and smaller regions of spacetime, the Lorentzian approx- 
imation becomes better and better. Suppose we want to do experi- 
ments in a laboratory, and we want to ensure that when we compare 
some physically observable quantity against predictions made based 
on the Lorentz geometry, the resulting discrepancy will not be too 
large. If the acceptable error is €, then we should be able to get the 
error down that low if we’re willing to make the size of our labora- 
tory no bigger than 6. This is clearly very similar to the Weierstrass 
style of defining limits and derivatives in calculus. In calculus, the 
idea expressed by differentiation is that every smooth curve can be 
approximated locally by a line; in general relativity, the equivalence 
principle tells us that curved spacetime can be approximated locally 
by flat spacetime. But consider that no practitioner of calculus ha- 
bitually solves problems by filling sheets of scratch paper with ep- 
silons and deltas. Instead, she uses the Leibniz notation, in which dy 
and dx are interpreted as infinitesimally small numbers. You may 
be inclined, based on your previous training, to dismiss infinitesi- 
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a/A vector can be_ thought 
of as lying in the plane tangent to 
a certain point. 


mals as neither rigorous nor necessary. In 1966, Abraham Robinson 
demonstrated that concerns about rigor had been unfounded; we'll 
come back to this point in section 3.3. Although it is true that any 
calculation written using infinitesimals can also be carried out using 
limits, the following example shows how much more well suited the 
infinitesimal language is to differential geometry. 


Areas on a sphere Example: 1 
The area of a region S in the Cartesian plane can be calculated 
as {.dA, where dA = dx dy is the area of an infinitesimal rectan- 
gle of width dx and height dy. A curved surface such as a sphere 
does not admit a global Cartesian coordinate system in which the 
constant coordinate curves are both uniformly spaced and per- 
pendicular to one another. For example, lines of longitude on the 
earth’s surface grow closer together as one moves away from the 
equator. Letting @ be the angle with respect to the pole, and ¢ the 
azimuthal angle, the approximately rectangular patch bounded by 
8, 8+d0, o, and 6+d¢ has width rsin@dé@ and height rdd, giving 
dA = r*sin@ dé d@. If you look at the corresponding derivation in 
an elementary calculus textbook that strictly eschews infinitesi- 
mals, the technique is to start from scratch with Riemann sums. 
This is extremely laborious, and moreover must be carried out 
again for every new case. In differential geometry, the curvature 
of the space varies from one point to the next, and clearly we 
don’t want to reinvent the wheel with Riemann sums an infinite 
number of times, once at each point in space. 


3.1 Tangent vectors 


It’s not immediately clear what a vector means in the context of 
curved spacetime. The freshman physics notion of a vector carries 
all kinds of baggage, including ideas like rotation of vectors and a 
magnitude that is positive for nonzero vectors. We also used to 
assume the ability to represent vectors as arrows, i.e., geometrical 
figures of finite size that could be transported to other places — 
but in a curved geometry, it is not in general possible to transport 
a figure to another location without distorting its shape, so there is 
no notion of congruence. For this reason, it’s better to visualize vec- 
tors as tangents to the underlying space, as in figure a. Intuitively, 
we want to think of these vectors as arrows that are infinitesimally 
small, so that they fit on the curved surface without having to be 
bent. In the pictures, we simply scale them up to make them visi- 
ble without an infinitely powerful microscope, and this scaling only 
makes them appear to rise out of the space in which they live. 


A more formal definition of the notion of a tangent vector is 
given on p. 201. 
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Affine notions and parallel transport 


3.2.1 The affine parameter in curved spacetime: a rough 
sketch 


We want to be able to measure things in curved spacetime. There 
turn out to be two complementary systems of measurement we can 
apply: affine measure and metric measure. Affine measure in a 
flat geometry was introduced in section 2.1.1, p. 42. Surprisingly, 
it turns out to be quite easy to generalize this to the curved case. 
Our construction of the affine parameter with a scaffolding of par- 
allelograms depended on the existence and uniqueness of parallels 
expressed by axiom A1 on p. 43, so we might imagine that there was 
no point in trying to generalize the construction to curved space- 
time. But the equivalence principle tells us that spacetime is locally 
affine to some approximation. Concretely, clock-time is one exam- 
ple of an affine parameter, and the curvature of spacetime clearly 
can’t prevent us from building a clock and releasing it on a free-fall 
trajectory. 


More generally, we can use the fact that every segment of a 
geodesic is geometrically similar to every other segment. For exam- 
ple, consider an arc of the earth’s equator spanning one degree of 
longitude. That arc could be slid along the equator to a different 
location, then expanded to cover 3 degrees of longitude. The two 
arcs are similar. 


Geodesics are special Example: 2 
The following three non-examples show that this is a special prop- 
erty of geodesics. 


The property is not enjoyed by a non-geodesic curve. A segment 
of a pentagon that encompasses one of the vertices is not similar 
to some other segment that is straight. 


Another non-example involving non-geodesics is the curve that 
we get in 1+1-dimensional spacetime by joining together the pos- 
itive x axis and the positive ft axis. We can never take a one-year 
segment of the t axis and, through any combination of boosts 
and rotations, make it coincide with a one-light-year piece of the 
X axis. The original segment is timelike, and any boost or rotation 
will preserve its timelike character. 


Furthermore, it is not true in general, when curvature exists, that 
we can take any geometrical figure, transport it wherever we like, 
and also scale it as we like. For example, Euclidean geometry is 
a good approximation on small portions of the Earth’s spherical 
surface, So a roadmap can be made in the shape of a rectangle 
with four right-angle corners. However, it is not possible to scale 
up such a rectangle; to map a large portion of the world, we have 
to introduce distortions of the type used in map projections. 


Section 3.2 Affine notions and parallel transport 
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a/Construction of an_ affine 
parameter in curved spacetime. 


Because geodesics have this special property, we can slide any 
portion of a geodesic to anywhere else on the geodesic and employ it 
as a standard of measure. This gives us a complete system of mea- 
surement along that geodesic, and it works regardless of whether the 
geodesic is timelike, lightlike, or spacelike. But as in flat geometry, 
affine measurement does not allow us to compare lengths along one 
geodesic to lengths along another. 


3.2.2 The affine parameter in more detail 


When we originally defined affine measure in section 2.1.1, p. 42, 
for a flat space, we did it through the explicit construction of a scaf- 
folding. An important example of the differential, i.e., local, nature 
of our geometry is the generalization of the scaffolding construction 
from to a context broader than affine geometry. 


To generalize the recipe for the construction (figure a), the first 
obstacle is the ambiguity of the instruction to construct parallelo- 
gram O1lqgq,, which requires us to draw 1q, parallel to Oqg. Suppose 
we construe this as an instruction to make the two segments initially 
parallel, i.e., parallel as they depart the line at 0 and 1. By the time 
they get to qg and q,, they may be converging or diverging. 


Because parallelism is only approximate here, there will be a 
certain amount of error in the construction of the affine parameter. 
One way of detecting such an error is that lattices constructed with 
different initial distances will get out of step with one another. For 
example, we can define 5 as before by requiring that the lattice 
constructed with initial segment 05 line up with the original lattice 
at 1. We will find, however, that they do not quite line up at 
other points, such as 2. Let’s use this discrepancy « = 2 — 2’ as 
a numerical measure of the error. It will depend on both 6,, the 
distance 01, and on de, the distance between 0 and qg. Since € 
vanishes for either 6; = 0 or 62 = O, and since the equivalence 
principle guarantees smooth behavior on small scales, the leading 
term in the error will in general be proportional to the product 
6162. In the language of infinitesimals, we can replace 6; and 69 
with infinitesimally short distances, which for simplicity we assume 
to be equal, and which we call d\. Then the affine parameter A 
is defined as \ = { dA, where the error of order d\? is, as usual, 
interpreted as the negligible discrepancy between the integral and 
its approximation as a Riemann sum. 


3.2.3 Parallel transport 


If you were alert, you may have realized that I cheated you at 
a crucial point in this construction. We were to make 1q, and Oqg 
“initially parallel” as they left 01. How should we even define this 
idea of “initially parallel?” We could try to do it by making angles 
qo01 and q,12 equal, but this doesn’t quite work, because it doesn’t 
specify whether the angle is to the left or the right on the two- 
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dimensional plane of the page. In three or more dimensions, the 
issue becomes even more serious. The construction workers building 
the lattice need to keep it all in one plane, but how do they do that 
in curved spacetime? 


A mathematician’s answer would be that our geometry lacks 
some additional structure called a connection, which is a rule that 
specifies how one locally flat neighborhood is to be joined seamlessly 
onto another locally flat neighborhood nearby. If you’ve ever bought 
two maps and tried to tape them together to make a big map, you’ve 
formed a connection. If the maps were on a large enough scale, 
you also probably noticed that this was impossible to do perfectly, 
because of the curvature of the earth. 


Physically, the idea is that in flat spacetime, it is possible to 
construct inertial guidance systems like the ones discussed on page 
73. Since they are possible in flat spacetime, they are also possible 
in locally flat neighborhoods of spacetime, and they can then be 
carried from one neighborhood to another. 


In three space dimensions, a gyroscope’s angular momentum vec- 
tor maintains its direction, and we can orient other vectors, such as 
1q,, relative to it. Suppose for concreteness that the construction 
of the affine parameter above is being carried out in three space di- 
mensions. We place a gyroscope at 0, orient its axis along 0qg, slide 
it along the line to 1, and then construct 1q, along that axis. 


In 3+1 dimensions, a gyroscope only does part of the job. We 
now have to maintain the direction of a four-dimensional vector. 
Four-vectors will not be discussed in detail until section 4.2, but 
similar devices can be used to maintain their orientations in space- 
time. These physical devices are ways of defining a mathematical 
notion known as parallel transport, which allows us to take a vector 
from one point to another in space. In general, specifying a notion 
of parallel transport is equivalent to specifying a connection. 


Parallel transport is path-dependent, as shown in figure b. 


Affine parameters defined only along geodesics 


In the context of flat spacetime, the affine parameter was defined 
only along lines, not arbitrary curves, and could not be compared 
between lines running in different directions. In curved spacetime, 
the same limitation is present, but with “along lines” replaced by 
“along geodesics.” Figure c shows what goes wrong if we try to 
apply the construction to a world-line that isn’t a geodesic. One 
definition of a geodesic is that it’s the course we’ll end up following 
if we navigate by keeping a fixed bearing relative to an inertial guid- 
ance device such as gyroscope; that is, the tangent to a geodesic, 
when parallel-transported farther along the geodesic, is still tangent. 
A non-geodesic curve lacks this property, and the effect on the con- 
struction of the affine parameter is that the segments nq,, drift more 


b/ Parallel transport is path- 
dependent. On the surface of 
this sphere, parallel-transporting 
a vector along ABC gives a 
different answer than transporting 
it along AC. 


c/Bad things happen if we 
try to construct an affine param- 
eter along a curve that isn't a 
geodesic. This curve is similar 
to path ABC in figure b.  Par- 
allel transport doesn’t preserve 
the vectors’ angle relative to 
the curve, as it would with a 
geodesic. The errors in the 
construction blow up in a way 
that wouldn’t happen if the curve 
had been a geodesic. The fourth 
dashed parallel flies off wildly 
around the back of the sphere, 
wrapping around and meeting 
the curve at a point, 4, that is 
essentially random. 
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Levi-Civita 
1941) worked on models of 


a/ Tullio (1873- 
number systems possessing 
infinitesimals and on differential 
geometry. He invented the 
tensor notation, which Einstein 
learned from his textbook. He 
was appointed to prestigious 
endowed chairs at Padua and the 
University of Rome, but was fired 
in 1938 because he was a Jew 
and an anti-fascist. 


and more out of alignment with the curve. 


3.3. Models 


A typical first reaction to the phrase “curved spacetime” — or even 
“curved space,” for that matter — is that it sounds like nonsense. 
How can featureless, empty space itself be curved or distorted? The 
concept of a distortion would seem to imply taking all the points 
and shoving them around in various directions as in a Picasso paint- 
ing, so that distances between points are altered. But if space has 
no identifiable dents or scratches, it would seem impossible to deter- 
mine which old points had been sent to which new points, and the 
distortion would have no observable effect at all. Why should we 
expect to be able to build differential geometry on such a logically 
dubious foundation? Indeed, historically, various mathematicians 
have had strong doubts about the logical self-consistency of both 
non-Euclidean geometry and infinitesimals. And even if an authori- 
tative source assures you that the resulting system is self-consistent, 
its mysterious and abstract nature would seem to make it difficult 
for you to develop any working picture of the theory that could 
play the role that mental sketches of graphs play in organizing your 
knowledge of calculus. 


Models provide a way of dealing with both the logical issues and 
the conceptual ones. Figure a on page 90 “pops” off of the page, 
presenting a strong psychological impression of a curved surface ren- 
dered in perspective. This suggests finding an actual mathematical 
object, such as a curved surface, that satisfies all the axioms of a 
certain logical system, such as non-Euclidean geometry. Note that 
the model may contain extrinsic elements, such as the existence of 
a third dimension, that are not connected to the system being mod- 
eled. 


Let’s focus first on consistency. In general, what can we say 
about the self-consistency of a mathematical system? To start with, 
we can never prove anything about the consistency or lack of consis- 
tency of something that is not a well-defined formal system, e.g., the 
Bible. Even Euclid’s Elements, which was a model of formal rigor for 
thousands of years, is loose enough to allow considerable ambiguity. 
If you’re inclined to scoff at the silly Renaissance mathematicians 
who kept trying to prove the parallel postulate E5 from postulates 
E1-E4, consider the following argument. Suppose that we replace 
E5 with E5’, which states that parallels don’t exist: given a line and 
a point not on the line, no line can ever be drawn through the point 
and parallel to the given line. In the new system of plane geometry 
E’ consisting of E1-E4 plus E5’, we can prove a variety of theorems, 
and one of them is that there is an upper limit on the area of any fig- 
ure. This imposes a limit on the size of circles, and that appears to 
contradict E3, which says we can construct a circle with any radius. 
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We therefore conclude that E’ lacks self-consistency. Oops! As your 
high school geometry text undoubtedly mentioned in passing, E’ is 
a perfectly respectable system called elliptic geometry. So what’s 
wrong with this supposed proof of its lack of self-consistency? The 
issue is the exact statement of E3. E38 does not say that we can 
construct a circle given any real number as its radius. Euclid could 
not have intended any such interpretation, since he had no notion of 
real numbers. To Euclid, geometry was primary, and numbers were 
geometrically constructed objects, being represented as lengths, an- 
gles, areas, and volumes. A literal translation of Euclid’s statement 
of the axiom is “To describe a circle with any center and distance.” ! 
“Distance” means a line segment. There is therefore no contradic- 
tion in FE’, because E’ has a limit on the lengths of line segments. 


Now suppose that such ambiguities have been eliminated from 
the system’s basic definitions and axioms. In general, we expect 
it to be easier to prove an inconsistent system’s inconsistency than 
to demonstrate the consistency of a consistent one. In the former 
case, we can start cranking out theorems, and if we can find a way 
to prove both proposition P and its negation =P, then obviously 
something is wrong with the system. One might wonder whether 
such a contradiction could remain contained within one corner of 
the system, like nuclear waste. It can’t. Aristotelian logic allows 
proof by contradiction: if we prove both P and —P based on certain 
assumptions, then our assumptions must have been wrong. If we 
can prove both P and =P without making any assumptions, then 
proof by contradiction allows us to establish the truth of any ran- 
domly chosen proposition. Thus a single contradiction is sufficient, 
in Aristotelian logic, to invalidate the entire system. This goes by 
the Latin rubric ex falso quodlibet, meaning “from a falsehood, what- 
ever you please.” Thus any contradiction proves the inconsistency 
of the entire system. 


Proving consistency is harder. If you’re mathematically sophisti- 
cated, you may be tempted to leap directly to Gédel’s theorem, and 
state that nobody can ever prove the self-consistency of a mathemat- 
ical system. This would be a misapplication of Gédel. Gédel’s theo- 
rem only applies to mathematical systems that meet certain techni- 
cal criteria, and some of the interesting systems we’re dealing with 
don’t meet those criteria; in particular, Goddel’s theorem doesn’t 
apply to Euclidean geometry, and Euclidean geometry was proved 
self-consistent by Tarski and his students around 1950. Further- 
more, we usually don’t require an absolute proof of self-consistency. 
Usually we’re satisfied if we can prove that a certain system, such 
as elliptic geometry, is at least as self-consistent as another system, 
such as Euclidean geometry. This is called equiconsistency. The 
general technique for proving equiconsistency of two theories is to 
show that a model of one can be constructed within the other. 


‘Heath, pp. 195-202 
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Suppose, for example, that we construct a geometry in which the 
space of points is the surface of a sphere, and lines are understood 
to be the geodesics, i.e., the great circles whose centers coincide at 
the sphere’s center. This geometry, called spherical geometry, is 
useful in cartography and navigation. It is non-Euclidean, as we 
can demonstrate by exhibiting at least one proposition that is false 
in Euclidean geometry. For example, construct a triangle on the 
earth’s surface with one corner at the north pole, and the other 
two at the equator, separated by 90 degrees of longitude. The sum 
of its interior angles is 270 degrees, contradicting Euclid, book I, 
proposition 32. Spherical geometry must therefore violate at least 
one of the axioms E1-E5, and indeed it violates both El (because 
no unique line is determined by two antipodal points such as the 
north and south poles) and E5 (because parallels don’t exist at all). 


A closely related construction gives a model of elliptic geometry, 
in which E1 holds, and only Ed is thrown overboard. To accomplish 
this, we model a point using a diameter of the sphere,” and a line as 
the set of all diameters lying in a certain plane. This has the effect 
of identifying antipodal points, so that there is now no violation of 
El. Roughly speaking, this is like lopping off half of the sphere, but 
making the edges wrap around. Since this model of elliptic geometry 
is embedded within a Euclidean space, all the axioms of elliptic 
geometry can now be proved as theorems in Euclidean geometry. Ifa 
contradiction arose from them, it would imply a contradiction in the 
axioms of Euclidean geometry. We conclude that elliptic geometry 
is equiconsistent with Euclidean geometry. This was known long 
before Tarski’s 1950 proof of Euclidean geometry’s self-consistency, 
but since nobody was losing any sleep over hidden contradictions 
in Euclidean geometry, mathematicians stopped wasting their time 
looking for contradictions in elliptic geometry. 


Infinitesimals Example: 3 
Consider the following axiomatically defined system of numbers: 


1. Itis a field, i.e., it has addition, subtraction, multiplication, and 
division with the usual properties. 


2. Itis an ordered geometry in the sense of 01-04 on p. 19, and 
the ordering relates to addition and multiplication in the usual 
way. 


3. Existence of infinitesimals: There exists a positive number d 
such thatd <1,d<1/2,d<1/3,... 


A model of this system can be constructed within the real number 
system by defining d as the identity function d(x) = x and forming 
the set of functions of the form f(d) = P(d)/Q(d), where P and Q 


?The term “elliptic? may be somewhat misleading here. The model is still 
constructed from a sphere, not an ellipsoid. 
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are polynomials with real coefficients. The ordering of functions f 
and g is defined according to the sign of lim y_,o+ f(x) — g(x). Ax- 
ioms 1-3 can all be proved from the real-number axioms. There- 
fore this system, which includes infinitesimals, is equiconsistent 
with the reals. More elaborate constructions can extend this to 
systems that have more of the properties of the reals, and a 
browser-based calculator that implements such a system is avail- 
able at lightandmatter.com/calc/inf. Abraham Robinson ex- 
tended this in 1966 to all of analysis, and thus there is nothing in- 
trinsically nonrigorous about doing analysis in the style of Gauss 
and Euler, with symbols like dx representing infinitesimally small 
quantities.* 


Besides proving consistency, these models give us insight into 
what’s going on. The model of elliptic geometry suggests an in- 
sight into the reason that there is an upper limit on lengths and 
areas: it is because the space wraps around on itself. The model of 
infinitesimals suggests a fact that is not immediately obvious from 
the axioms: the infinitesimal quantities compose a hierarchy, so that 
for example 7d is in finite proportion to d, while d? is like a “lesser 
flea” in Swift’s doggerel: “Big fleas have little fleas/ On their backs 
to ride ’em,/ and little fleas have lesser fleas,/And so, ad infinitum.” 


Spherical and elliptic geometry are not valid models of a general- 
relativistic spacetime, since they are locally Euclidean rather than 
Lorentzian, but they still provide us with enough conceptual guid- 
ance to come up with some ideas that might never have occurred to 
us otherwise: 


e In spherical geometry, we can have a two-sided polygon called 
a lune that encloses a nonzero area. In general relativity, a 
lune formed by the world-lines of two particles represents mo- 
tion in which the particles separate but are later reunited, 
presumably because of some mass between them that created 
a gravitational field. An example is gravitational lensing. 


e Both spherical models wraps around on themselves, so that 
they are not topologically equivalent to infinite planes. We 
therefore form a conjecture there may be a link between cur- 
vature, which is a local property, and topology, which is global. 
Such a connection is indeed observed in relativity. For exam- 
ple, cosmological solutions of the equations of general relativ- 
ity come in two flavors. One type has enough matter in it to 
produce more than a certain critical amount of curvature, and 
this type is topologically closed. It describes a universe that 
has finite spatial volume, and that will only exist for a finite 


3More on this topic is available in, for example, Keisler’s Elementary Calcu- 
lus: An Infinitesimal Approach, Stroyan’s A Brief Introduction to Infinitesimal 
Calculus, or my own Calculus, all of which are available for free online. 


b/An Einstein’s ring is formed 
when there is a chance alignment 
of a distant source with a closer 
gravitating body. Here, a quasar, 
MG1131+0456, is seen as a 
ring due to focusing of light by 
an unknown object, possibly a 
supermassive black hole. Be- 
cause the entire arrangement 
lacks perfect axial symmetry, the 
ring is nonuniform; most of its 
brightness is concentrated in two 
lumps on opposite sides. This 
type of gravitational lensing is 
direct evidence for the curvature 
of space predicted by gravita- 
tional lensing. The two geodesics 
form a lune, which is a figure 
that cannot exist in Euclidean 
geometry. 
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time before it recontracts in a Big Crunch. The other type, 
corresponding to the universe we actually inhabit, has infinite 
spatial volume, will exist for infinite time, and is topologically 
open. 


e There is a distance scale set by the size of the sphere, with its 
inverse being a measure of curvature. In general relativity, 
we expect there to be a similar way to measure curvature 
numerically, although the curvature may vary from point to 
point. 


Self-check: Prove from the axioms E’ that elliptic geometry, un- 
like spherical geometry, cannot have a lune with two distinct ver- 
tices. Convince yourself nevertheless, using the spherical model of 
E’, that it is possible in elliptic geometry for two lines to enclose a 
region of space, in the sense that from any point P in the region, 
a ray emitted in any direction must intersect one of the two lines. 
Summarize these observations with a characterization of lunes in 
elliptic geometry versus lunes in spherical geometry. 


3.4 Intrinsic quantities 


Models can be dangerous, because they can tempt us to impute 
physical reality to features that are purely extrinsic, i.e., that are 
only present in that particular model. This is as opposed to intrinsic 
features, which are present in all models, and which are therefore 
logically implied by the axioms of the system itself. The existence 
of lunes is clearly an intrinsic feature of non-Euclidean geometries, 
because intersection of lines was defined before any model has even 
been proposed. 


Curvature in elliptic geometry Example: 4 
What about curvature? In the spherical model of elliptic geom- 
etry, the size of the sphere is an inverse measure of curvature. 
Is this a valid intrinsic quantity, or is it extrinsic? It seems sus- 
pect, because it is a feature of the model. If we try to define 
“size” as the radius FR of the sphere, there is clearly reason for 
concern, because this seems to refer to the center of the sphere, 
but existence of a three-dimensional Euclidean space inside and 
outside the surface is clearly an extrinsic feature of the model. 
There is, however, a way in which a creature confined to the sur- 
face can determine RAR, by constructing geodesic and an affine 
parameter along that geodesic, and measuring the distance A ac- 
cumulated until the geodesic returns to the initial point. Since 
antipodal points are identified, A equals half the circumference of 
the sphere, not its whole circumference, so R = A/z, by wholly 
intrinsic methods. 


Extrinsic curvature Example: 5 
Euclid’s axioms E1-E5 refer to explicit constructions. If a two- 
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dimensional being can physically verify them all as descriptions of 
the two-dimensional space she inhabits, then she knows that her 
space is Euclidean, and that propositions such as the Pythagorean 
theorem are physically valid in her universe. But the diagram in 
a/1 illustrating illustrating the proof of the Pythagorean theorem in 
Euclid’s Elements (proposition 1.47) is equally valid if the page is 
rolled onto a cylinder, 2, or formed into a wavy corrugated shape, 
3. These types of curvature, which can be achieved without tear- 
ing or crumpling the surface, are extrinsic rather than intrinsic. Of 
the curved surfaces in figure a, only the sphere, 4, has intrinsic 
curvature; the diagram can’t be plastered onto the sphere without 
folding or cutting and pasting. 


a/ Example 5. 


Self-check: How would the ideas of example 5 apply to a cone? 


Example 5 shows that it can be difficult to sniff out bogus ex- 
trinsic features that seem intrinsic, and example 4 suggests the de- 
sirability of developing methods of calculation that never refer to 
any extrinsic quantities, so that we never have to worry whether a 
symbol like R staring up at us from a piece of paper is intrinsic. 
This is why it is unlikely to be helpful to a student of general rel- 
ativity to pick up a book on differential geometry that was written 
without general relativity specifically in mind. Such books have a 
tendency to casually mix together intrinsic and extrinsic notation. 
For example, a vector cross product a x b refers to a vector poking 
out of the plane occupied by a and b, and the space outside the 
plane may be extrinsic; it is not obvious how to generalize this op- 
eration to the 3+1 dimensions of relativity (since the cross product 
is a three-dimensional beast), and even if it were, we could not be 
assured that it would have any intrinsically well defined meaning. 


3.4.1 Coordinate independence 


To see how to proceed in creating a manifestly intrinsic notation, 
consider the two types of intrinsic observations that are available in 
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general relativity: 


e 1. We can tell whether events and world-lines are incident: 
whether or not two lines intersect, two events coincide, or an 
event lies on a certain line. 


Incidence measurements, for example detection of gravitational lens- 
ing, are global, but they are the only global observations we can 
do.* If we were limited entirely to incidence, spacetime would be 
described by the austere system of projective geometry, a geometry 
without parallels or measurement. In projective geometry, all propo- 
sitions are essentially statements about combinatorics, e.g., that it 
is impossible to plant seven trees so that they form seven lines of 
three trees each. 


But: 
e 2. We can also do measurements in local Lorentz frames. 


This gives us more power, but not as much as we might expect. 
Suppose we define a coordinate such as t or x. In Newtonian me- 
chanics, these coordinates would form a predefined background, a 
preexisting stage for the actors. In relativity, on the other hand, 
consider a completely arbitrary change of coordinates of the form 
x — 2’ = f(x), where f is a smooth one-to-one function. For ex- 
ample, we could have x > x + px® + qsin(rx) (with p and q chosen 
small enough so that the mapping is always one-to-one). Since the 
mapping is one-to-one, the new coordinate system preserves all the 
incidence relations. Since the mapping is smooth, the new coordi- 
nate system is still compatible with the existence of local Lorentz 
frames. The difference between the two coordinate systems is there- 
fore entirely extrinsic, and we conclude that a manifestly intrinsic 
notation should avoid any explicit reference to a coordinate system. 
That is, if we write a calculation in which a symbol such as x ap- 
pears, we need to make sure that nowhere in the notation is there 
any hidden assumption that x comes from any particular coordinate 
system. For example, the equation should still be valid if the generic 
symbol x is later taken to represent the distance r from some center 
of symmetry. This coordinate-independence property is also known 
as general covariance, and this type of smooth change of coordinates 
is also called a diffeomorphism. 


The Dehn twist Example: 6 
As an exotic example of a change of coordinates, take a torus 
and label it with coordinates (0, @), where 6+ 271 is taken to be the 
same as 9, and similarly for @. Now subject it to the coordinate 
transformation T defined by 8 — 0+ ¢, which is like opening the 


“Finstein referred to incidence measurements as “determinations of space- 
time coincidences.” For his presentation of this idea, see p. ??. 
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torus, twisting it by a full circle, and then joining the ends back 
together. T is known as the “Dehn twist,” and it is different from 
most of the coordinate transformations we do in relativity because 
it can’t be done smoothly, i.e., there is no continuous function f(x) 
on 0 < x < 1 such that every value of f is a smooth coordinate 
transformation, f(0) is the identity transformation, and f(1) = T. 


Frames moving at c? 


A good application of these ideas is to the question of what the 
world would look like in a frame of reference moving at the speed 
of light. This question has a long and honorable history. As a 
young student, Einstein tried to imagine what an electromagnetic 
wave would look like from the point of view of a motorcyclist riding 
alongside it. We now know, thanks to Einstein himself, that it really 
doesn’t make sense to talk about such observers. 


The most straightforward argument is based on the positivist 
idea that concepts only mean something if you can define how to 
measure them operationally. If we accept this philosophical stance 
(which is by no means compatible with every concept we ever discuss 
in physics), then we need to be able to physically realize this frame 
in terms of an observer and measuring devices. But we can’t. It 
would take an infinite amount of energy to accelerate Einstein and 
his motorcycle to the speed of light. 


Since arguments from positivism can often kill off perfectly in- 
teresting and reasonable concepts, we might ask whether there are 
other reasons not to allow such frames. There are. Recall that 
we placed two technical conditions on coordinate transformations: 
they are supposed to be smooth and one-to-one. The smoothness 
condition is related to the inability to boost Einstein’s motorcycle 
into the speed-of-light frame by any continuous, classical process. 
(Relativity is a classical theory.) But independent of that, we have 
a problem with the one-to-one requirement. Figure b shows what 
happens if we do a series of Lorentz boosts to higher and higher 
velocities. It should be clear that if we could do a boost up to a ve- 
locity of c, we would have effected a coordinate transformation that 
was not one-to-one. Every point in the plane would be mapped onto 
a single lightlike line. 


3.5 The metric 


Consider a coordinate x defined along a certain curve, which is not 
necessarily a geodesic. For concreteness, imagine this curve to exist 
in two spacelike dimensions, which we can visualize as the surface 
of a sphere embedded in Euclidean 3-space. These concrete features 
are not strictly necessary, but they drive home the point that we 
should not expect to be able to define x so that it varies at a steady 
rate with elapsed distance; for example, we know that it will not be 


b/A series of Lorentz boosts 


acts on a square. 
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a/The tick marks on the line 
define a coordinate measured 
along the line. It is not possible to 
set up such a coordinate system 
globally so that the coordinate 
is uniform everywhere. The 
arrows represent changes in the 
value of the coordinate; since the 
changes in the coordinate are 
all equal, the arrows are all the 
same length. 


b/The vectors +dx and dx> 
are duals of each other. 


possible to define a two-dimensional Cartesian grid on the surface 
of a sphere. In the figure, the tick marks are therefore not evenly 
spaced. This is perfectly all right, given the coordinate invariance of 
general relativity. Since the incremental changes in x are equal, I’ve 
represented them below the curve as little vectors of equal length. 
They are the wrong length to represent distances along the curve, 
but this wrongness is an inevitable fact of life in relativity. 


Now suppose we want to integrate the arc length of a segment 
of this curve. The little vectors are infinitesimal. In the integrated 
length, each little vector should contribute some amount, which is 
a scalar. This scalar is not simply the magnitude of the vector, 
ds # Vdx-dx, since the vectors are the wrong length. Figure a 
is clearly reminiscent of the geometrical picture of vectors and dual 
vectors developed on p. 48. But the purely affine notion of vectors 
and their duals is not enough to define the length of a vector in 
general; it is only sufficient to define a length relative to other lengths 
along the same geodesic. When vectors lie along different geodesics, 
we need to be able to specify the additional conversion factor that 
allows us to compare one to the other. The piece of machinery that 
allows us to do this is called a metric. 


Fixing a metric allows us to define the proper scaling of the tick 
marks relative to the arrows at a given point, i.e., in the birdtracks 
notation it gives us a natural way of taking a displacement vector 
such as +s , with the arrow pointing into the symbol, and making 
a corresponding dual vector s+ , with the arrow coming out. This 
is a little like cloning a person but making the clone be of the op- 
posite sex. Hooking them up like s+s then tells us the squared 
magnitude of the vector. For example, if ~dz is an infinitesimal 
timelike displacement, then da*dz is the squared time interval dx? 
measured by a clock traveling along that displacement in spacetime. 
(Note that in the notation da’, it’s clear that dx is a scalar, because 
unlike +dxz and dx> it doesn’t have any arrow coming in or out 
of it.) Figure b shows the resulting picture. 


ae dx> 
ue . 


In the abstract index notation introduced on p. 50, the vectors 
+dx and dz> are written dx* and dz,. When a specific coordinate 
system has been fixed, we write these with concrete, Greek indices, 
da” and dz,. In an older and conceptually incompatible notation 
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and terminology due to Sylvester (1853), one refers to da as a con- 
travariant vector, and dz, as covariant. The confusing terminology 
is summarized on p. 413. 


The assumption that a metric exists is nontrivial. There is no 
metric in Galilean spacetime, for example, since in the limit c > co 
the units used to measure timelike and spacelike displacements are 
not comparable. Assuming the existence of a metric is equivalent to 
assuming that the universe holds at least one physically manipulable 
clock or ruler that can be moved over long distances and accelerated 
as desired. In the distant future, large and causally isolated regions 
of the cosmos may contain only massless particles such as photons, 
which cannot be used to build clocks (or, equivalently, rulers); the 
physics of these regions will be fully describable without a metric. 
If, on the other hand, our world contains not just zero or one but 
two or more clocks, then the metric hypothesis requires that these 
clocks maintain a consistent relative rate when accelerated along 
the same world-line. This consistency is what allows us to think 
of relativity as a theory of space and time rather than a theory of 
clocks and rulers. There are other relativistic theories of gravity 
besides general relativity, and some of these violate this hypothesis. 


Given a dz", how do we find its dual dz,,, and vice versa? In 
one dimension, we simply need to introduce a real number g as a 
correction factor. If one of the vectors is shorter than it should be 
in a certain region, the correction factor serves to compensate by 
making its dual proportionately longer. The two possible mappings 
(covariant to contravariant and contravariant to covariant) are ac- 
complished with factors of g and 1/g. The number g is the metric, 
and it encodes all the information about distances. For example, if 
¢ represents longitude measured at the arctic circle, then the metric 
is the only source for the datum that a displacement d@ corresponds 
to 2540 km per radian. 


Now let’s generalize to more than one dimension. Because glob- 
ally Cartesian coordinate systems can’t be imposed on a curved 
space, the constant-coordinate lines will in general be neither evenly 
spaced nor perpendicular to one another. If we construct a local 
set of basis vectors lying along the intersections of the constant- 
coordinate surfaces, they will not form an orthonormal set. We 
would like to have an expression of the form ds? = © da! dx, for 
the squared arc length, and using the Einstein summation notation 
this becomes 


ds? = dr" AE ji 


3.5.1 The Euclidean metric 


For Cartesian coordinates in a Euclidean plane, where one doesn’t 
normally bother with the distinction between covariant and con- 
travariant vectors, this expression for ds? is simply the Pythagorean 
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theorem, summed over two values of yz for the two coordinates: 
ds* = da" dx, = dx? + dy” 


The symbols dz, ds°, dx®, and da are all synonyms, and likewise 
for dy, ds!, dx!, and dx,;. (Because notations such as ds! force the 
reader to keep track of which digits have been assigned to which 
letters, it is better practice to use notation such as dy or ds¥; the 
latter notation could in principle be confused with one in which y 
was a variable taking on values such as 0 or 1, but in reality we 
understand it from context, just as we understand that the d’s in 
dy/ dx are not referring to some variable d that stands for a number.) 


In the non-Euclidean case, the Pythagorean theorem is false; dx” 
and dz,, are no longer synonyms, so their product is no longer simply 
the square of a distance. To see this more explicitly, let’s write the 
expression so that only the covariant quantities occur. By local 
flatness, the relationship between the covariant and contravariant 
vectors is linear, and the most general relationship of this kind is 
given by making the metric a symmetric matrix g,,. Substituting 
dt, = Gut”, we have 


ds? = Goda dz, 


where there are now implied sums over both ys and v. Notice how 
implied sums occur only when the repeated index occurs once as 
a superscript and once as a subscript; other combinations are un- 
grammatical. 


Self-check: Why does it make sense to demand that the metric 
be symmetric? 


On p. 46 we encountered the distinction among scalars, vectors, 
and dual vectors. These are specific examples of tensors, which can 
be expressed in the birdtracks notation as objects with m arrows 
coming in and n coming out, or. In index notation, we have m 
superscripts and n subscripts. A scalar has m = n = 0. A dual 
vector has (m,n) = (0,1), a vector (1,0), and the metric (0,2). We 
refer to the number of indices as the rank of the tensor. Tensors are 
discussed in more detail, and defined more rigorously, in chapter 4. 
For our present purposes, it is important to note that just because 
we write a symbol with subscripts or superscripts, that doesn’t mean 
it deserves to be called a tensor. This point can be understood in 
the more elementary context of Newtonian scalars and vectors. For 
example, we can define a Euclidean “vector” u = (m,T,e), where 
m is the mass of the moon, T is the temperature in Chicago, and 
e is the charge of the electron. This creature u doesn’t deserve 
to be called a vector, because it doesn’t behave as a vector under 
rotation. The general philosophy is that a tensor is something that 
has certain properties under changes of coordinates. For example, 
we’ve already seen on p. 48 the different scaling behavior of tensors 
with ranks (1,0), (0,0), and (0, 1). 
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When discussing the symmetry of rank-2 tensors, it is convenient 
to introduce the following notation: 


1 
a (Tae + Toa) 


Tab) = 5 
1 
Tab] = 5 (Tab — Tha) 


Any T,») can be split into symmetric and antisymmetric parts. This 
is similar to writing an arbitrary function as a sum of and odd 
function and an even function. The metric has only a symmetric 
part: (ab) = Jab; and gi, = 0. This notation is generalized to 
ranks greater than 2 on page 184. 


Self-check: Characterize an antisymmetric rank-2 tensor in two 
dimensions. 


A change of scale Example: 7 
> Suppose we start by describing the Euclidean plane with a cer- 
tain set of Cartesian coordinates, but then want to change to a 
new set of coordinates that are rescaled compared to the original 
ones. How is the effect of this rescaling represented in g? 


> If we change our units of measurement so that x# > ax", while 

demanding that ds* come out the same, then we need guy > 
i) 

a“ Guy 


Comparing with p. 48, we deduce the general rule that a tensor 


of rank (m,n) transforms under scaling by picking up a factor of 
ol, 


This whole notion of scaling and units in general relativity turns 
out to be nontrivial and interesting. See section 5.11, p. 202, for 
a more detailed discussion. 


Polar coordinates Example: 8 
Consider polar coordinates (r, 8) in a Euclidean plane. The const- 
ant-coordinate curves happen to be orthogonal everywhere, so 
the off-diagonal elements of the metric g;¢ and go, vanish. In- 
finitesimal coordinate changes dr and dé correspond to infinitesi- 
mal displacements dr and rdé in orthogonal directions, so by the 
Pythagorean theorem, ds* = dr* + r?d0?, and we read off the 
elements of the metric g,, = 1 and ggg = r°. 


Notice how in example 8 we started from the generally valid 
relation ds? = Juv dx" dx”, but soon began writing down facts like 
goo = vr? that were only valid in this particular coordinate system. 
To make it clear when this is happening, we maintain the distinction 
between abtract Latin indices and concrete Greek indices introduced 
on p. 50. For example, we can write the general expression for 
squared differential arc length with Latin indices, 


ds* = gi; dx! dz’, 
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because it holds regardless of the coordinate system, whereas the 
vanishing of the off-diagonal elements of the metric in Euclidean 
polar coordinates has to be written as g,, = 0 for uw 4 v, since it 
would in general be false if we used a different coordinate system to 
describe the same Euclidean plane. 


Oblique Cartesian coordinates Example: 9 
> Oblique Cartesian coordinates are like normal Cartesian coor- 
dinates in the plane, but their axes are at at an angle } 4 71/2 to 
one another. Find the metric in these coordinates. The space is 
globally Euclidean. 


> Since the coordinates differ from Cartesian coordinates only in 
the angle between the axes, not in their scales, a displacement 
dx! along either axis, i = 1 or 2, must give ds = dx, so for the diag- 
onal elements we have gj; = goo = 1. The metric is always sym- 
metric, SO G12 = Qo1. To fix these off-diagonal elements, consider 
a displacement by ds in the direction perpendicular to axis 1. This 
changes the coordinates by dx! = — dscot @ and dx? = dscos o. 
We then have 


ds? = gj dx! dx! 
= ds*(cot® @ + csc?  — 2912 cot # csc ¢) 
9i2 = COS. 


Area Example: 10 
In one dimension, g is a single number, and lengths are given by 
ds = ,/gdx. The square root can also be understood through 
example 7 on page 103, in which we saw that a uniform rescaling 
x — ox is reflected in Quy > «7? Quy. 


In two-dimensional Cartesian coordinates, multiplication of the 
width and height of a rectangle gives the element of area dA = 
V911922 dx'dx?. Because the coordinates are orthogonal, g is 
diagonal, and the factor of ,/911 922 is identified as the square root 
of its determinant, so dA = \/|g| dx' dx?. Note that the scales on 
the two axes are not necessarily the same, 911 4 goo. 


The same expression for the element of area holds even if the co- 
ordinates are not orthogonal. In example 9, for instance, we have 
Jlg| = V/1—cos? o = sin, which is the right correction factor 
corresponding to the fact that dx! and dx? form a parallelepiped 
rather than a rectangle. 


Area of a sphere Example: 11 
For coordinates (6, @) on the surface of a sphere of radius r, we 
have, by an argument similar to that of example 8 on page 103, 
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9oo = F?, Gop = 2 Sin? 8, Gow = 0. The area of the sphere is 


A= [a 


- | { vigiaede 


-? | [sinedoag 


= 4nr? 


Inverse of the metric Example: 12 
> Relate g’ to gj. 


> The notation is intended to treat covariant and contravariant 
vectors completely symmetrically. The metric with lower indices 
gi; can be interpreted as a change-of-basis transformation from a 
contravariant basis to a covariant one, and if the symmetry of the 
notation is to be maintained, g’/ must be the corresponding in- 
verse matrix, which changes from the covariant basis to the con- 
travariant one. The metric must always be invertible. 


In the one-dimensional case, p. 100, the metric at any given 
point was simply some number g, and we used factors of g and 
1/g to convert back and forth between covariant and contravariant 
vectors. Example 12 makes it clear how to generalize this to more 
dimensions: 


b 
La = Gabt 


gt = gry 


This is referred to as raising and lowering indices. There is no need 
to memorize the positions of the indices in these rules; they are 
the only ones possible based on the grammatical rules, which are 
that summation only occurs over top-bottom pairs, and upper and 
lower indices have to match on both sides of the equals sign. This 
whole system, introduced by Einstein, is called “index-gymnastics” 
notation. 


Raising and lowering indices on a rank-two tensor Example: 13 
In physics we encounter various examples of matrices, such as 
the moment of inertia tensor from classical mechanics. These 
have two indices, not just one like a vector. Again, the rules for 
raising and lowering indices follow directly from grammar. For 
example, 
A’, = 97° Acb 
and 
Aab = Jac9baA~. 
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A matrix operating on a vector Example: 14 
The row and column vectors from linear algebra are the covari- 
ant and contravariant vectors in our present terminology. (The 
convention is that covariant vectors are row vectors and con- 
travariant ones column vectors, but | don’t find this worth memo- 
rizing.) What about matrices? A matrix acting on a column vector 
gives another column vector, q = Up. Translating this into index- 
gymnastics notation, we have 


q?=U~_p?, 


where we want to figure out the correct placement of the indices 
on U. Grammatically, the only possible placement is 


gq? = U?,.p”. 
This shows that the natural way to represent a column-vector-to- 


column-vector linear operator is as a rank-2 tensor with one upper 
index and one lower index. 


In birdtracks notation, a rank-2 tensor is something that has two 
arrows connected to it. Our example becomes —> q =—> U = p. 
That the result is itself an upper-index vector is shown by the fact 
that the right-hand-side taken as a whole has a single external 
arrow coming into it. 


The distinction between vectors and their duals may seem ir- 
relevant if we can always raise and lower indices at will. We can’t 
always do that, however, because in many perfectly ordinary situa- 
tions there is no metric. See example 6, p. 49. 


3.5.2 The Lorentz metric 


In a locally Euclidean space, the Pythagorean theorem allows us 
to express the metric in local Cartesian coordinates in the simple 
form Gig = ly. Gu = "0 hey ge = diag( 41,4 ley F lj. Phisvis 
not the appropriate metric for a locally Lorentz space. The axioms 
of Euclidean geometry E3 (existence of circles) and E4 (equality of 
right angles) describe the theory’s invariance under rotations, and 
the Pythagorean theorem is consistent with this, because it gives the 
same answer for the length of a vector even if its components are 
reexpressed in a new basis that is rotated with respect to the original 
one. In a Lorentzian geometry, however, we care about invariance 
under Lorentz boosts, which do not preserve the quantity ¢? + x. 
It is not circles in the (t,x) plane that are invariant, but light cones, 
and this is described by giving gi: and gzz opposite signs and equal 
absolute values. A lightlike vector (t,x), with t = x, therefore has 
a magnitude of exactly zero, 


s° = gut? TE Cm = 0, 


and this remains true after the Lorentz boost (t,2) > (yt, yx). It 
is a matter of convention which element of the metric to make pos- 
itive and which to make negative. In this book, Pll use gy = +1 
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and grr = —1, so that g = diag(+1,-1). This has the advan- 
tage that any line segment representing the timelike world-line of a 
physical object has a positive squared magnitude; the forward flow 
of time is represented as a positive number, in keeping with the 
philosophy that relativity is basically a theory of how causal rela- 
tionships work. With this sign convention, spacelike vectors have 
positive squared magnitudes, timelike ones negative. The same con- 
vention is followed, for example, by Penrose. The opposite version, 
with g = diag(—1,+1) is used by authors such as Wald and Misner, 
Thorne, and Wheeler. 


Our universe does not have just one spatial dimension, it has 
three, so the full metric in a Lorentz frame is given by 
g = diag(+1, —1,—-1, -1). 


Mixed covariant-contravariant form of the metric Example: 15 
In example 13 on p. 105, we saw how to raise and lower indices 
on arank-two tensor, and example 14 showed that it is sometimes 
natural to consider the form in which one index is raised and one 
lowered. The metric itself is a rank-two tensor, so let’s see what 
happens when we compute the mixed form g%, from the lower- 
index form. In general, we have 


A’, = 9*°Acp: 
and substituting g for A gives 
9» = 9° Dob- 


But we already know that g’ is simply the inverse matrix of g... 
(example 12, p. 105), which means that g%, is simply the identity 
matrix. That is, whereas a quantity like gp or g2° carries all the 
information about our system of measurement at a given point, 
g*, carries no information at all. Where gap or g#° can have both 
positive and negative elements, elements that have units, and 
off-diagonal elements, g%, is just a generic symbol carrying no 
information other than the dimensionality of the space. 


The metric tensor is so commonly used that it is simply left out of 
birdtrack diagrams. Consistency is maintained because because 
g*, is the identity matrix, so > g — is the same as >-—. 


3.5.3 Isometry, inner products, and the Erlangen Program 


In Euclidean geometry, the dot product of vectors a and b is 
given by grrAzbz + GyyAyby + Gzzazbz = Ayby + ayby + azb;, and in 
the special case where a = b we have the squared magnitude. In 
the tensor notation, ab, = a'by + a?by + a°b3. Like magnitudes, 
dot products are invariant under rotations. This is because know- 
ing the dot product of vectors a and b entails knowing the value 
of a- b = |a||a|cos6,p, and Euclid’s E4 (equality of right angles) 
implies that the angle @ap is invariant. the same axioms also entail 
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invariance of dot products under translation; Euclid waits only until 
the second proposition of the Elements to prove that line segments 
can be copied from one location to another. This seeming triviality is 
actually false as a description of physical space, because it amounts 
to a statement that space has the same properties everywhere. 


The set of all transformations that can be built out of succes- 
sive translations, rotations, and reflections is called the group of 
isometries. It can also be defined as the group? that preserves dot 
products, or the group that preserves congruence of triangles. 


In Lorentzian geometry, we usually avoid the Euclidean term 
dot product and refer to the corresponding operation by the more 
general term inner product. In a specific coordinate system we have 
ab, = a°bp —a'b, —a7by—a°b3. The inner product is invariant under 
Lorentz boosts, and also under the Euclidean isometries. The group 
found by making all possible combinations of continuous transfor- 
mations® from these two sets is called the Poincaré group. The 
Poincaré group is not the symmetry group of all of spacetime, since 
curved spacetime has different properties in different locations. The 
equivalence principle tells us, however, that space can be approxi- 
mated locally as being flat, so the Poincaré group is locally valid, 
just as the Euclidean isometries are locally valid as a description of 
geometry on the Earth’s curved surface. 


The triangle inequality Example: 16 
In Euclidean geometry, the triangle inequality |b + c| < |b| + |c| 
follows from 


(\b| + |c|)? — (b+) - (b +c) = 2(\b||c] — b-c) > 0. 


The reason this quantity always comes out positive is that for two 
vectors of fixed magnitude, the greatest dot product is always 
achieved in the case where they lie along the same direction. 


In Lorentzian geometry, the situation is different. Let b and c be 
timelike vectors, so that they represent possible world-lines. Then 
the relation a = b+c suggests the existence of two observers who 
take two different paths from one event to another. A goes by a 
direct route while B takes a detour. The magnitude of each time- 
like vector represents the time elapsed on a clock carried by the 


>In mathematics, a group is defined as a binary operation that has an identity, 
inverses, and associativity. For example, addition of integers is a group. In the 
present context, the members of the group are not numbers but the transforma- 
tions applied to the Euclidean plane. The group operation on transformations 
T, and T> consists of finding the transformation that results from doing one and 
then the other, i.e., composition of functions. 

® The discontinuous transformations of spatial reflection and time reversal are 
not included in the definition of the Poincaré group, although they do preserve 
inner products. General relativity has symmetry under spatial reflection (called 
P for parity), time reversal (T), and charge inversion (C), but the standard 
model of particle physics is only invariant under the composition of all three, 
CPT, not under any of these symmetries individually. 
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observer moving along that vector. The triangle equality is now 
reversed, becoming |b + c| > |b| + |c|. The difference from the 
Euclidean case arises because inner products are no longer nec- 
essarily maximized if vectors are in the same direction. E.g., for 
two lightlike vectors, b'cj vanishes entirely if b and c are paral- 
lel. For timelike vectors, parallelism actually minimizes the inner 
product rather than maximizing it.’ 


In his 1872 inaugural address at the University of Erlangen, Felix 
Klein used the idea of groups of transformations to lay out a gen- 
eral classification scheme, known as the Erlangen program, for all 
the different types of geometry. Each geometry is described by the 
group of transformations, called the principal group, that preserves 
the truth of geometrical statements. Euclidean geometry’s principal 
group consists of the isometries combined with arbitrary changes of 
scale, since there is nothing in Euclid’s axioms that singles out a 
particular distance as a unit of measurement. In other words, the 
principal group consists of the transformations that preserve simi- 
larity, not just those that preserve congruence. Affine geometry’s 
principal group is the transformations that preserve parallelism; it 
includes shear transformations, and there is therefore no invariant 
notion of angular measure or congruence. Unlike Euclidean and 
affine geometry, elliptic geometry does not have scale invariance. 
This is because there is a particular unit of distance that has special 
status; as we saw in example 4 on page 96, a being living in an el- 
liptic plane can determine, by entirely intrinsic methods, a distance 
scale R, which we can interpret in the hemispherical model as the 
radius of the sphere. General relativity breaks this symmetry even 
more severely. Not only is there a scale associated with curvature, 
but the scale is different from one point in space to another. 


3.5.4 Einstein’s carousel 


Non-Euclidean geometry observed in the rotating frame 


The following example was historically important, because Ein- 
stein used it to convince himself that general relativity should be 
described by non-Euclidean geometry.® Its interpretation is also 
fairly subtle, and the early relativists had some trouble with it. 


Suppose that observer A is on a spinning carousel while observer 
B stands on the ground. B says that A is accelerating, but by the 


"Proof: Let b and € be parallel and timelike, and directed forward in time. 
Adopt a frame of reference in which every spatial component of each vector 
vanishes. This entails no loss of generality, since inner products are invariant 
under such a transformation. Since the time-ordering is also preserved under 
transformations in the Poincaré group, each is still directed forward in time, not 
backward. Now let b and € be pulled away from parallelism, like opening a pair 
of scissors in the x — t plane. This reduces b;Ct, while causing byCx to become 
negative. Both effects increase the inner product. 

8 The example is described in Einstein’s paper “The Foundation of the General 
Theory of Relativity.” An excerpt, which includes the example, is given on p. ??. 


d/Observer A, rotating with 
the carousel, measures an 
azimuthal distance with a ruler. 
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equivalence principle A can say that she is at rest in a gravitational 
field, while B is free-falling out from under her. B measures the 
radius and circumference of the carousel, and finds that their ratio 
is 27x. A carries out similar measurements, but when she puts her 
meter-stick in the azimuthal direction it becomes Lorentz-contracted 
by the factor y = (1—w?r?)~1/2, so she finds that the ratio is greater 
than 27. In A’s coordinates, the spatial geometry is non-Euclidean, 
and the metric differs from the Euclidean one found in example 8 
on page 103. 


Observer A feels a force that B considers to be fictitious, but 
that, by the equivalence principle, A can say is a perfectly real 
gravitational force. According to A, an observer like B is free-falling 
away from the center of the disk under the influence of this gravita- 
tional field. A also observes that the spatial geometry of the carousel 
is non-Euclidean. Therefore it seems reasonable to conjecture that 
gravity can be described by non-Euclidean geometry, rather than as 
a physical force in the Newtonian sense. 


At this point, you know as much about this example as Einstein 
did in 1912, when he began using it as the seed from which general 
relativity sprouted, collaborating with his old schoolmate, mathe- 
matician Marcel Grossmann, who knew about differential geometry. 
The remainder of subsection 3.5.4, which you may want to skip on a 
first reading, goes into more detail on the interpretation and math- 
ematical description of the rotating frame of reference. Even more 
detailed treatments are given by Gron? and Dieks.!°. 


Ehrenfest’s paradox 


Ehrenfest!! described the following paradox. Suppose that ob- 
server B, in the lab frame, measures the radius of the disk to be r 
when the disk is at rest, and r’ when the disk is spinning. B can 
also measure the corresponding circumferences C and C’. Because 
B is in an inertial frame, the spatial geometry does not appear non- 
Euclidean according to measurements carried out with his meter 
sticks, and therefore the Euclidean relations C = 27r and C’! = 2rr’ 
both hold. The radial lines are perpendicular to their own motion, 
and they therefore have no length contraction, r = r’, implying 
C=C’. The outer edge of the disk, however, is everywhere tan- 
gent to its own direction of motion, so it is Lorentz contracted, and 
therefore C’ < C. The resolution of the paradox is that it rests on 
the incorrect assumption that a rigid disk can be made to rotate. 
If a perfectly rigid disk was initially not rotating, one would have 
to distort it in order to set it into rotation, because once it was 


Relativistic description of a rotating disk, Am. J. Phys. 43 (1975) 869 

Space, Time, and Coordinates in a Rotating World, http://www. phys.uu. 
nl/igg/dieks 

‘IP. Ehrenfest, Gleichformige Rotation starrer Kérper und Relativitatstheorie, 
Z. Phys. 10 (1909) 918, available in English translation at en.wikisource. org. 
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rotating its outer edge would no longer have a length equal to 27 
times its radius. Therefore if the disk is perfectly rigid, it can never 
be rotated. As discussed on page 64, relativity does not allow the 
existence of infinitely rigid or infinitely strong materials. If it did, 
then one could violate causality. If a perfectly rigid disk existed, vi- 
brations in the disk would propagate at infinite velocity, so tapping 
the disk with a hammer in one place would result in the transmis- 
sion of information at v > c to other parts of the disk, and then 
there would exist frames of reference in which the information was 
received before it was transmitted. The same applies if the hammer 
tap is used to impart rotational motion to the disk. 


Self-check: What if we build the disk by assembling the building 
materials so that they are already rotating properly before they are 
joined together? 


The metric in the rotating frame 


What if we try to get around these problems by applying torque 
uniformly all over the disk, so that the rotation starts smoothly and 
simultaneously everywhere? We then run into issues identical to the 
ones raised by Bell’s spaceship paradox (p. 65). In fact, Ehrenfest’s 
paradox is nothing more than Bell’s paradox wrapped around into 
a circle. The same question of time synchronization comes up. 


To spell this out mathematically, let’s find the metric according 
to observer A by applying the change of coordinates 6’ = 6 — wt. 
First we take the Euclidean metric of example 8 on page 103 and 
rewrite it as a (globally) Lorentzian metric in spacetime for observer 
B, 


(1] ds? = dt? — dr? — r? dé”. 
Applying the transformation into A’s coordinates, we find 
[2] ds? = (1 — w?r?) dt? — dr? — r? dé’? — Qur? dé’ dt. 


Recognizing wr as the velocity of one frame relative to another, and 
(1—w?r?)—!/? as y, we see that we do have a relativistic time dilation 
effect in the dt? term. But the dr? and d@” terms look Euclidean. 
Why don’t we see any Lorentz contraction of the length scale in the 
azimuthal direction? 


The answer is that coordinates in general relativity are arbitrary, 
and just because we can write down a certain set of coordinates, that 
doesn’t mean they have any special physical interpretation. The 
coordinates (t,r,6’) do not correspond physically to the quantities 
that A would measure with clocks and meter-sticks. The tip-off 
is the d6’ dt cross-term. Suppose that A sends two cars driving 
around the circumference of the carousel, one clockwise and one 
counterclockwise, from the same point. If (t,r,6’) coordinates cor- 
responded to clock and meter-stick measurements, then we would 


e / Einstein 


and 
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expect that when the cars met up again on the far side of the disk, 
their dashboards would show equal values of the arc length r6’ on 
their odometers and equal proper times ds on their clocks. But this 
is not the case, because the sign of the d6’ dt term is opposite for the 
two world-lines. The same effect occurs if we send beams of light 
in both directions around the disk, and this is the Sagnac effect (p. 
73). 


This is a symptom of the fact that the coordinate t is not prop- 
erly synchronized between different places on the disk. We already 
know that we should not expect to be able to find a universal time 
coordinate that will match up with every clock, regardless of the 
clock’s state of motion. Suppose we set ourselves a more modest 
goal. Can we find a universal time coordinate that will match up 
with every clock, provided that the clock is at rest relative to the 
rotating disk? 


The spatial metric and synchronization of clocks 
A trick for improving the situation is to eliminate the d6’ dt 


cross-term by completing the square in the metric [2]. The result is 


wr2 a r2 
5-5 d6"| — dr? 5-5 40”. 
wer 1—w*r 


ds? = (1 — wr?) |dt+ i 
The interpretation of the quantity in square brackets is as follows. 
Suppose that two observers situate themselves on the edge of the 
disk, separated by an infinitesimal angle d6’. They then synchronize 
their clocks by exchanging light pulses. The time of flight, measured 
in the lab frame, for each light pulse is the solution of the equation 
ds? = 0, and the only difference between the clockwise result dt; 
and the counterclockwise one dt arises from the sign of dé’. The 
quantity in square brackets is the same in both cases, so the amount 
by which the clocks must be adjusted is dt = (dtz — dt1)/2, or 


2 
wr 
d= 0". 
1 — wr? 
Substituting this into the metric, we are left with the purely spatial 
metric 


re 


1—wr2 


[3] ds? = —dr? dé”. 

The factor of (1 — w?r?)"! = 42 in the dé” term is simply the 
expected Lorentz-contraction factor. In other words, the circumfer- 
ence is, as expected, greater than 27r by a factor of ¥. 


Does the metric [3] represent the same non-Euclidean spatial 
geometry that A, rotating with the disk, would determine by meter- 
stick measurements? Yes and no. It can be interpreted as the 
one that A would determine by radar measurements. That is, if 


Chapter 3 __ Differential Geometry 


A measures a round-trip travel time dt for a light signal between 
points separated by coordinate distances dr and dé’, then A can say 
that the spatial separation is dt/2, and such measurements will be 
described correctly by [3]. Physical meter-sticks, however, present 
some problems. Meter-sticks rotating with the disk are subject to 
Coriolis and centrifugal forces, and this problem can’t be avoided 
simply by making the meter-sticks infinitely rigid, because infinitely 
rigid objects are forbidden by relativity. In fact, these forces will in- 
evitably be strong enough to destroy any meter stick that is brought 
out to r = 1/w, where the speed of the disk becomes equal to the 
speed of light. 


It might appear that we could now define a global coordinate 


wr ! 


fee ee ee 
Tage : 


interpreted as a time coordinate that was synchronized in a con- 
sistent way for all points on the disk. The trouble with this inter- 
pretation becomes evident when we imagine driving a car around 
the circumference of the disk, at a speed slow enough so that there 
is negligible time dilation of the car’s dashboard clock relative to 
the clocks tied to the disk. Once the car gets back to its original 
position, 6’ has increased by 27, so it is no longer possible for the 
car’s clock to be synchronized with the clocks tied to the disk. We 
conclude that it is not possible to synchronize clocks in a rotating 
frame of reference; if we try to do it, we will inevitably have to have 
a discontinuity somewhere. This problem is present even locally, as 
demonstrated by the possibility of measuring the Sagnac effect with 
apparatus that is small compared to the disk. The only reason we 
were able to get away with time synchronization in order to establish 
the metric [3] is that all the physical manifestations of the impossi- 
bility of synchronization, e.g., the Sagnac effect, are proportional to 
the area of the region in which synchronization is attempted. Since 
we were only synchronizing two nearby points, the area enclosed by 
the light rays was zero. 


GPS Example: 17 
As a practical example, the GPS system is designed mainly to 
allow people to find their positions relative to the rotating surface 
of the earth (although it can also be used by space vehicles). That 
is, they are interested in their (r, 0’, b) coordinates. The frame of 
reference defined by these coordinates is referred to as ECEF, for 
Earth-Centered, Earth-Fixed. 


The system requires synchronization of the atomic clocks carried 
aboard the satellites, and this synchronization also needs to be 
extended to the (less accurate) clocks built into the receiver units. 
It is impossible to carry out such a synchronization globally in the 
rotating frame in order to create coordinates (T,r,0’,). If we 
tried, it would result in discontinuities (see problem 8, p. 121). 
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Instead, the GPS system handles clock synchronization in coor- 
dinates (t,r, 0’, d), as in equation [2]. These are known as the 
Earth-Centered Inertial (ECI) coordinates. The t coordinate in 
this system is not the one that users at neighboring points on 
the earth’s surface would establish if they carried out clock syn- 
chronization using electromagnetic signals. It is simply the time 
coordinate of the nonrotating frame of reference tied to the earth’s 
center. Conceptually, we can imagine this time coordinate as one 
that is established by sending out an electromagnetic “tick-tock” 
signal from the earth’s center, with each satellite correcting the 
phase of the signal based on the propagation time inferred from 
its own r. In reality, this is accomplished by communication with a 
master control station in Colorado Springs, which communicates 
with the satellites via relays at Kwajalein, Ascension Island, Diego 
Garcia, and Cape Canaveral. 


Einstein’s goof, in the rotating frame Example: 18 
Example 10 on p. 57 recounted Einstein’s famous mistake in pre- 
dicting that a clock at the pole would experience a time dilation 
relative to a clock at the equator, and the empirical test of this fact 
by Alley et al. using atomic clocks. The perfect cancellation of 
gravitational and kinematic time dilations might seem fortuitous, 
but it fact it isn’t. When we transform into the frame rotating along 
with the earth, there is no longer any kinematic effect at all, be- 
cause neither clock is moving. In this frame, the surface of the 
earth’s oceans is an equipotential, so the gravitational time di- 
lation vanishes as well, assuming both clocks are at sea level. 
In the transformation to the rotating frame, the metric picks up a 
dé’ dt term, but since both clocks are fixed to the earth’s surface, 
they have dé’ = 0, and there is no Sagnac effect. 


Impossibility of rigid rotation, even with external forces 


The determination of the spatial metric with rulers at rest rel- 
ative to the disk is appealing because of its conceptual simplicity 
compared to complicated procedures involving radar, and this was 
presumably why Einstein presented the concept using ruler measure- 
ments in his 1916 paper laying out the general theory of relativity.!? 
In an effort to recover this simplicity, we could propose using exter- 
nal forces to compensate for the centrifugal and Coriolis forces to 
which the rulers would be subjected, causing them to stay straight 
and maintain their correct lengths. Something of this kind is car- 
ried out with the large mirrors of some telescopes, which have active 
systems that compensate for gravitational deflections and other ef- 
fects. The first issue to worry about is that one would need some 
way to monitor a ruler’s length and straightness. The monitoring 
system would presumably be based on measurements with beams 


!2The paper is reproduced in the back of the book, and the relevant part is 
on p. ??. 
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of light, in which case the physical rulers themselves would become 
superfluous. 


In addition, we would need to be able to manipulate the rulers in 
order to place them where we wanted them, and these manipulations 
would include angular accelerations. If such a thing was possible, 
then it would also amount to a loophole in the resolution of the 
Ehrenfest paradox. Could Ehrenfest’s rotating disk be accelerated 
and decelerated with help from external forces, which would keep it 
from contorting into a potato chip? The problem we run into with 
such a strategy is one of clock synchronization. When it was time to 
impart an angular acceleration to the disk, all of the control systems 
would have to be activated simultaneously. But we have already 
seen that global clock synchronization cannot be realized for an 
object with finite area, and therefore there is a logical contradiction 
in this proposal. This makes it impossible to apply rigid angular 
acceleration to the disk, but not necessarily the rulers, which could 
in theory be one-dimensional. 


3.6 The metric in general relativity 


So far we’ve considered a variety of examples in which the metric 
is predetermined. This is not the case in general relativity. For 
example, Einstein published general relativity in 1915, but it was 
not until 1916 that Schwarzschild found the metric for a spherical, 
gravitating body such as the sun or the earth. 


When masses are present, finding the metric is analogous to 
finding the electric field made by charges, but the interpretation is 
more difficult. In the electromagnetic case, the field is found on 
a preexisting background of space and time. In general relativity, 
there is no preexisting geometry of spacetime. The metric tells us 
how to find distances in terms of our coordinates, but the coordi- 
nates themselves are completely arbitrary. So what does the metric 
even mean? This was an issue that caused Einstein great distress 
and confusion, and at one point, in 1914, it even led him to pub- 
lish an incorrect, dead-end theory of gravity in which he abandoned 
coordinate-independence. 


With the benefit of hindsight, we can consider these issues in 
terms of the general description of measurements in relativity given 
on page 98: 

1. We can tell whether events and world-lines are incident. 


2. We can do measurements in local Lorentz frames. 


3.6.1 The hole argument 


The main factor that led Einstein to his false start is known as 
the hole argument. Suppose that we know about the distribution of 
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a/Einstein’s hole 


argument. 


b/A paradox? Planet A has 
no equatorial bulge, but B does. 
What cause produces this effect? 
Einstein reasoned that the cause 
couldn’t be B’s rotation, because 
each planet rotates relative to the 
other. 


matter throughout all of spacetime, including a particular region of 
finite size — the “hole”? — which contains no matter. By analogy 
with other classical field theories, such as electromagnetism, we ex- 
pect that the metric will be a solution to some kind of differential 
equation, in which matter acts as the source term. We find a metric 
g(x) that solves the field equations for this set of sources, where x is 
some set of coordinates. Now if the field equations are coordinate- 
independent, we can introduce a new set of coordinates x’, which is 
identical to x outside the hole, but differs from it on the inside. If 
we reexpress the metric in terms of these new coordinates as g'(x’), 
then we are guaranteed that g’(x’) is also a solution. But further- 
more, we can substitute x for x’, and g’(x) will still be a solution. 
For outside the hole there is no difference between the primed and 
unprimed quantities, and inside the hole there is no mass distribu- 
tion that has to match the metric’s behavior on a point-by-point 
basis. 


We conclude that in any coordinate-invariant theory, it is impos- 
sible to uniquely determine the metric inside such a hole. Einstein 
initially decided that this was unacceptable, because it showed a 
lack of determinism; in a classical theory such as general relativity, 
we ought to be able to predict the evolution of the fields, and it 
would seem that there is no way to predict the metric inside the 
hole. He eventually realized that this was an incorrect interpreta- 
tion. The only type of global observation that general relativity lets 
us do is measurements of the incidence of world-lines. Relabeling all 
the points inside the hole doesn’t change any of the incidence rela- 
tions. For example, if two test particles sent into the region collide 
at a point x inside the hole, then changing the point’s name to x’ 
doesn’t change the observable fact that they collided. 


A Machian paradox 


Another type of argument that made Einstein suffer is also re- 
solved by a correct understanding of measurements, this time the 
use of measurements in local Lorentz frames. The earth is in hy- 
drostatic equilibrium, and its equator bulges due to its rotation. 
Suppose that the universe was empty except for two planets, each 
rotating about the line connecting their centers.!’ Since there are 
no stars or other external points of reference, the inhabitants of each 
planet have no external reference points against which to judge their 
rotation or lack of rotation. They can only determine their rotation, 
Einstein said, relative to the other planet. Now suppose that one 
planet has an equatorial bulge and the other doesn’t. This seems to 
violate determinism, since there is no cause that could produce the 
differing effect. The people on either planet can consider themselves 
as rotating and the other planet as stationary, or they can describe 


The example is described in Einstein’s paper “The Foundation of the General 
Theory of Relativity.” An excerpt, which includes the example, is given on p. ??. 
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the situation the other way around. Einstein believed that this ar- 
gument proved that there could be no difference between the sizes 
of the two planets’ equatorial bulges. 


The flaw in Einstein’s argument was that measurements in local 
Lorentz frames do allow one to make a distinction between rotation 
and a lack of rotation. For example, suppose that scientists on 
planet A notice that their world has no equatorial bulge, while planet 
B has one. They send a space probe with a clock to B, let it stay 
on B’s surface for a few years, and then order it to return. When 
the clock is back in the lab, they compare it with another clock that 
stayed in the lab on planet A, and they find that less time has elapsed 
according to the one that spent time on B’s surface. They conclude 
that planet B is rotating more quickly than planet A, and that the 
motion of B’s surface was the cause of the observed time dilation. 
This resolution of the apparent paradox depends specifically on the 
Lorentzian form of the local geometry of spacetime; it is not available 
in, e.g., Cartan’s curved-spacetime description of Newtonian gravity 
(see page 41). 


Einstein’s original, incorrect use of this example sprang from his 
interest in the ideas of the physicist and philosopher Ernst Mach. 
Mach had a somewhat ill-defined idea that since motion is only a 
well-defined notion when we speak of one object moving relative 
to another object, the inertia of an object must be caused by the 
influence of all the other matter in the universe. Einstein referred 
to this as Mach’s principle. Einstein’s false starts in constructing 
general relativity were frequently related to his attempts to make his 
theory too “Machian.” Section 8.3 on p. 357 discusses an alternative, 
more Machian theory of gravity proposed by Brans and Dicke in 
1951. 


3.7 Interpretation of coordinate independence 


This section discusses some of the issues that arise in the inter- 
pretation of coordinate independence. It can be skipped on a first 
reading. 


3.7.1 Is coordinate independence obvious? 


One often hears statements like the following from relativists: 
“Coordinate independence isn’t really a physical principle. It’s 
merely an obvious statement about the relationship between math- 
ematics and the physical universe. Obviously the universe doesn’t 
come equipped with coordinates. We impose those coordinates on 
it, and the way in which we do so can never be dictated by nature.” 
The impressionable reader who is tempted to say, “Ah, yes, that is 
obvious,” should consider that it was far from obvious to Newton 
(“Absolute, true and mathematical time, of itself, and from its own 
nature flows equably without regard to anything external ...”), nor 
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was it obvious to Einstein. Levi-Civita nudged Einstein in the di- 
rection of coordinate independence in 1912. Einstein tried hard to 
make a coordinate-independent theory, but for reasons described in 
section 3.6.1 (p. 115), he convinced himself that that was a dead 
end. In 1914-15 he published theories that were not coordinate- 
independent, which you will hear relativists describe as “obvious” 
dead ends because they lack any geometrical interpretation. It seems 
to me that it takes a highly refined intuition to regard as intuitively 
“obvious” an issue that Einstein struggled with like Jacob wrestling 
with Elohim. 


3.7.2 ls coordinate independence trivial? 


It has also been alleged that coordinate independence is trivial. 
To gauge the justice of this complaint, let’s distinguish between two 
reasons for caring about coordinate independence: 


1. Coordinate independence tells us that when we solve problems, 
we should avoid writing down any equations in notation that 
isn’t manifestly intrinsic, and avoid interpreting those equa- 
tions as if the coordinates had intrinsic meaning. Violating 
this advice doesn’t guarantee that you’ve made a mistake, but 
it makes it much harder to tell whether or not you have. 


2. Coordinate independence can be used as a criterion for judging 
whether a particular theory is likely to be successful. 


Nobody questions the first justification. The second is a little trick- 
ier. Laying out the general theory systematically in a 1916 paper,!4 
Einstein wrote “The general laws of nature are to be expressed by 
equations which hold good for all the systems of coordinates, that is, 
are covariant with respect to any substitutions whatever (generally 
covariant).” In other words, he was explaining why, with hindsight, 
his 1914-1915 coordinate-dependent theory had to be a dead end. 


The only trouble with this is that Einstein’s way of posing the 
criterion didn’t quite hit the nail on the head mathematically. As 
Hilbert famously remarked, “Every boy in the streets of Gottingen 
understands more about four-dimensional geometry than Einstein. 
Yet, in spite of that, Einstein did the work and not the mathemati- 
cians.” What Einstein had in mind was that a theory like Newtonian 
mechanics not only lacks coordinate independence, but would also 
be impossible to put into a coordinate-independent form without 
making it look hopelessly complicated and ugly, like putting lipstick 
on a pig. But Kretschmann showed in 1917 that any theory could 
be put in coordinate independent form, and Cartan demonstrated in 
1923 that this could be done for Newtonian mechanics in a way that 
didn’t come out particularly ugly. Physicists today are more apt to 
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pose the distinction in terms of “background independence” (mean- 
ing that a theory should not be phrased in terms of an assumed ge- 
ometrical background) or lack of a “prior geometry” (meaning that 
the curvature of spacetime should come from the solution of field 
equations rather than being imposed by fiat). But these concepts as 
well have resisted precise mathematical formulation.!° My feeling 
is that this general idea of coordinate independence or background 
independence is like the equivalence principle: a crucial conceptual 
principle that doesn’t lose its importance just because we can’t put 
it in a mathematical box with a ribbon and a bow. For example, 
string theorists take it as a serious criticism of their theory that it is 
not manifestly background independent, and one of their goals is to 
show that it has a background independence that just isn’t obvious 
on the surface. 


3.7.3 Coordinate independence as a choice of gauge 


It is instructive to consider coordinate independence from the 
point of view of a field theory. Newtonian gravity can be described 
in three equivalent ways: as a gravitational field g, as a gravitational 
potential ¢, or as a set of gravitational field lines. The field lines are 
never incident on one another, and locally the field satisfies Poisson’s 
equation. 


The electromagnetic field has polarization properties different 
from those of the gravitational field, so we describe it using either 
the two fields (E,B), a pair of potentials,'© or two sets of field 
lines. There are similar incidence conditions and local field equations 
(Maxwell’s equations). 


Gravitational fields in relativity have polarization properties un- 
known to Newton, but the situation is qualitatively similar to the 
two foregoing cases. Now consider the analogy between electromag- 
netism and relativity. In electromagnetism, it is the fields that are 
directly observable, so we expect the potentials to have some extrin- 
sic properties. We can, for example, redefine our electrical ground, 
® +> ®+C, without any observable consequences. As discussed in 
more detail in section 5.6.1 on page 173, it is even possible to modify 
the electromagnetic potentials in an entirely arbitrary and nonlinear 
way that changes from point to point in spacetime. This is called a 
gauge transformation. In relativity, the gauge transformations are 
the smooth coordinate transformations. These gauge transforma- 
tions distort the field lines without making them cut through one 
another. 


' Giulini, “Some remarks on the notions of general covariance and background 
independence,” arxiv.org/abs/gr-qc/0603087v1 

16There is the familiar electrical potential ¢, measured in volts, but also a 
vector potential A, which you may or may not have encountered. Briefly, the 
electric field is given not by —V¢@ but by -V¢d—0A/dt, while the magnetic field 
is the curl of A. This is introduced at greater length in section 4.2.5 on page 
137. 
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a/Since magnetic field lines 
can never intersect, a magnetic 
field pattern contains coordinate- 
independent information in the 
form of the knotting of the lines. 
This figure shows the mag- 
netic field pattern of the star 
SU Aurigae, as measured by 
Zeeman-Doppler imaging (Petit 
at al.). White lines represent 
magnetic field lines that close 
upon themselves in the immedi- 
ate vicinity of the star; blue lines 
are those that extend out into the 
interstellar medium. 
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Problems 


1 Consider a spacetime that is locally exactly like the stan- 
dard Lorentzian spacetime described in ch. 2, but that has a global 
structure differing in the following way from the one we have im- 
plicitly assumed. This spacetime has global property G: Let two 
material particles have world-lines that coincide at event A, with 
some nonzero relative velocity; then there may be some event B in 
the future light-cone of A at which the particles’ world-lines coincide 
again. This sounds like a description of something that we would 
expect to happen in curved spacetime, but let’s see whether that 
is necessary. We want to know whether this violates the flat-space 
properties L1-L5 on page 412, if those properties are taken as local. 
(a) Demonstrate that it does not violate them, by using a model in 
which space “wraps around” like a cylinder. 

(b) Now consider the possibility of interpreting L1-L5 as global state- 
ments. Do spacetimes with property G always violate L3 if L3 is 
taken globally? > Solution, p. 389 


2 Usually in relativity we pick units in which c = 1. Suppose, 
however, that we want to use SI units. The convention is that co- 
ordinates are written with upper indices, so that, fixing the usual 
Cartesian coordinates in 1+1 dimensions of spacetime, an infinites- 
imal displacement between two events is notated (ds‘,ds”). In SI 
units, the two components of this vector have different units, which 
may seem strange but is perfectly legal. Describe the form of the 
metric, including the units of its elements. Describe the lower-index 
vector dsq. > Solution, p. 389 


3 (a) Explain why the following expressions ain’t got good 
grammar: Uga, xy", p*—qa- (Recall our notational convention that 
Latin indices represent abstract indices, so that it would not make 
sense, for example, to interpret Ugq as U’s ath diagonal element 
rather than as an implied sum.) 
(b) Which of these could also be nonsense in terms of units? 

> Solution, p. 390 


4 Suppose that a mountaineer describes her location using co- 
ordinates (0, ¢,h), representing colatitude, longitude, and altitude. 
Infer the units of the components of ds® and of the elements of gap 
and g®. Given that the units of mechanical work should be newton- 
meters (cf 5, p. 48), infer the components of a force vector F, and 
its upper-index version F. > Solution, p. 390 


5 Generalize figure h/2 on p. 48 to three dimensions. 
> Solution, p. 390 
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6 Suppose you have a collection of pencils, some of which 
have been sharpened more times than others so that they they’re 
shorter. You toss them all on the floor in random orientations, 
and you’re then allowed to slide them around but not to rotate 
them. Someone asks you to make up a definition of whether or 
not a given set of three pencils “cancels.” If all pencils are treated 
equally (i.e., order doesn’t matter), and if we respect the rotational 
invariance of Euclidean geometry, then you will be forced to reinvent 
vector addition and define cancellation of pencils p, q, and r as 
p+q+r=0. Do something similar with “pencil” replaced by “an 
oriented pairs of lines as in figure h/2 on p. 48. 


7 Describe the quantity g*,. (Note the repeated index.) 
> Solution, p. 390 


8 Example 17 on page 113 discusses the discontinuity that 
would result if one attempted to define a time coordinate for the 
GPS system that was synchronized globally according to observers 
in the rotating frame, in the sense that neighboring observers could 
verify the synchronization by exchanging electromagnetic signals. 
Calculate this discontinuity at the equator, and estimate the re- 
sulting error in position that would be experienced by GPS users. 
> Solution, p. 390 


9 Resolve the following paradox. 


Equation [3] on page 112 claims to give the metric obtained by an ob- 
server on the surface of a rotating disk. This metric is shown to lead 
to a non-Euclidean value for the ratio of the circumference of a circle 
to its radius, so the metric is clearly non-Euclidean. Therefore a lo- 
cal observer should be able to detect violations of the Pythagorean 
theorem. 


And yet this metric was originally derived by a series of changes 
of coordinates, starting from the Euclidean metric in polar coordi- 
nates, as derived in example 8 on page 103. Section 3.4 (p. 96) 
argued that the intrinsic measurements available in relativity are 
not capable of detecting an arbitrary smooth, one-to-one change of 
coordinates. This contradicts our earlier conclusion that there are 
locally detectable violations of the Pythagorean theorem. 
> Solution, p. 390 


10 This problem deals with properties of the metric [3] on page 
112. (a) A pulse of collimated light is emitted from the center of 
the disk in a certain direction. Does the spatial track of the pulse 
form a geodesic of this metric? (b) Characterize the behavior of the 
geodesics near r = 1/w. (c) An observer at rest with respect to the 
surface of the disk proposes to verify the non-Euclidean nature of 
the metric by doing local tests in which right triangles are formed 
out of laser beams, and violations of the Pythagorean theorem are 
detected. Will this work? > Solution, p. 391 
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11 In the early decades of relativity, many physicists were in the 
habit of speaking as if the Lorentz transformation described what an 
observer would actually “see” optically, e.g., with an eye or a camera. 
This is not the case, because there is an additional effect due to opti- 
cal aberration: observers in different states of motion disagree about 
the direction from which a light ray originated. This is analogous 
to the situation in which a person driving in a convertible observes 
raindrops falling from the sky at an angle, even if an observer on the 
sidewalk sees them as falling vertically. In 1959, Terrell and Penrose 
independently provided correct analyses,!” showing that in reality 
an object may appear contracted, expanded, or rotated, depending 
on whether it is approaching the observer, passing by, or receding. 
The case of a sphere is especially interesting. Consider the following 
four cases: 


A The sphere is not rotating. The sphere’s center is at rest. The 
observer is moving in a straight line. 


B The sphere is not rotating, but its center is moving in a straight 
line. The observer is at rest. 


C The sphere is at rest and not rotating. The observer moves 
around it in a circle whose center coincides with that of the 
sphere. 


D The sphere is rotating, with its center at rest. The observer is 
at rest. 


Penrose showed that in case A, the outline of the sphere is still 
seen to be a circle, although regions on the sphere’s surface appear 
distorted. 


What can we say about the generalization to cases B, C, and D? 
> Solution, p. 391 


12 This problem involves a relativistic particle of mass m which 
is also a wave, as described by quantum mechanics. Let c = 1 and 
h = 1 throughout. Starting from the de Broglie relations E = w 
and p = k, where k is the wavenumber, find the dispersion relation 
connecting w to k. Calculate the group velocity, and verify that it is 
consistent with the usual relations p = myv and E = my for m > 0. 
What goes wrong if you instead try to associate v with the phase 
velocity? > Solution, p. 391 


'7 James Terrell, “Invisibility of the Lorentz Contraction,” Physical Review 116 
(1959) 1045. Roger Penrose, “The Apparent Shape of a Relativistically Moving 
Sphere,” Proceedings of the Cambridge Philosophical Society 55 (1959) 139. 
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Tensors 


We now have enough machinery to be able to calculate quite a bit of 
interesting physics, and to be sure that the results are actually mean- 
ingful in a relativistic context. The strategy is to identify relativistic 
quantities that behave as Lorentz scalars and Lorentz vectors, and 
then combine them in various ways. The notion of a tensor has been 
introduced on page 102. A Lorentz scalar is a tensor of rank 0, and 
a Lorentz vector is a rank-1 tensor. 


4.1 Lorentz scalars 


A Lorentz scalar is a quantity that remains invariant under both spa- 
tial rotations and Lorentz boosts. Mass is a Lorentz scalar.! Elec- 
tric charge is also a Lorentz scalar, as demonstrated to extremely 
high precision by experiments measuring the electrical neutrality of 
atoms and molecules to a relative precision of better than 10~?°; the 
electron in a hydrogen atom has typically velocities of about 1/100, 
and those in heavier elements such as uranium are highly relativis- 
tic, so any violation of Lorentz invariance would give the atoms a 
nonvanishing net electric charge. 


The time measured by a clock traveling along a particular world- 
line from one event to another is something that all observers will 
agree upon; they will simply note the mismatch with their own 
clocks. It is therefore a Lorentz scalar. This clock-time as measured 
by a clock attached to the moving body in question is often referred 
to as proper time, “proper” being used here in the somewhat archaic 
sense of “own” or “self,” as in “The Vatican does not lie within Italy 
proper.” Proper time, which we notate 7, can only be defined for 
timelike world-lines, since a lightlike or spacelike world-line isn’t 
possible for a material clock. 


More generally, when we express a metric as ds? = ..., the 
quantity ds is a Lorentz scalar. In the special case of a timelike 
world-line, ds and dv are the same thing. (In books that use a 
—++4+ metric, one has ds = — dr.) 


Even more generally, affine parameters, which exist independent 
of any metric at all, are scalars. As a trivial example, if 7 is a 
particular object’s proper time, then 7 is a valid affine parameter, 


‘Some older books define mass as transforming according to m > ym, which 
can be made to give a self-consistent theory, but is ugly. 
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but so is 27+7. Less trivially, a photon’s proper time is always zero, 
but one can still define an affine parameter along its trajectory. We 
will need such an affine parameter, for example, in section 6.2.8, 
page 233, when we calculate the deflection of light rays by the sun, 
one of the early classic experimental tests of general relativity. 


Another example of a Lorentz scalar is the pressure of a perfect 
fluid, which is often assumed as a description of matter in cosmo- 
logical models. 


Infinitesimals and the clock “postulate” Example: 1 
At the beginning of chapter 3, | motivated the use of infinitesimals 
as useful tools for doing differential geometry in curved space- 
time. Even in the context of special relativity, however, infinitesi- 
mals can be useful. One way of expressing the proper time accu- 
mulated on a moving clock is 


s= [as 
= | \lajaxiax 


2 2 2 
-[-@)-(&) -() 
dt dt dt 
which only contains an explicit dependence on the clock’s veloc- 
ity, not its acceleration. This is an example of the clock “postulate” 
referred to in the remark at the end of homework problem 1 on 
page 83. Note that the clock postulate only applies in the limit of 


a small clock. This is represented in the above equation by the 
use of infinitesimal quantities like dx. 


4.2 Four-vectors 


4.2.1 The velocity and acceleration four-vectors 


Our basic Lorentz vector is the spacetime displacement dx’. Any 
other quantity that has the same behavior as dz’ under rotations 
and boosts is also a valid Lorentz vector. Consider a particle moving 
through space, as described in a Lorentz frame. Since the particle 
may be subject to nongravitational forces, the Lorentz frame can- 
not be made to coincide (except perhaps momentarily) with the 
particle’s rest frame. If dz’ is not lightlike, then the corresponding 
infinitesimal proper time interval dv is nonzero. As with Newtonian 
three-vectors, dividing a four-vector by a Lorentz scalar produces 
another quantity that transforms as a four-vector, so dividing the 
infinitesimal displacement by a nonzero infinitesimal proper time 
interval, we have the four-velocity vector v' = da’/ dr, whose com- 
ponents in a Lorentz coordinate system are (y, yu!, yu?, yu3), where 
(ut, u?,u?) is the ordinary three-component velocity vector as de- 
fined in classical mechanics. The four-velocity’s squared magnitude 
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v'v; is always exactly 1, even though the particle is not moving at 
the speed of light. (If it were moving at the speed of light, we would 
have dr = 0, and v would be undefined.) 


When we hear something referred to as a “vector,” we usually 
take this is a statement that it not only transforms as a vector, but 
also that it adds as a vector. But we have already seen in section 
2.3.1 on page 65 that even collinear velocities in relativity do not 
add linearly; therefore they clearly cannot add linearly when dressed 
in the clothing of four-vectors. We’ve also seen in section 2.5.3 that 
the combination of non-collinear boosts is noncommutative, and is 
generally equivalent to a boost plus a spatial rotation; this is also 
not consistent with linear addition of four-vectors. At the risk of 
beating a dead horse, a four-velocity’s squared magnitude is always 
1, and this is not consistent with being able to add four-velocity 
vectors. 


A zero velocity vector? Example: 2 
> Suppose an object has a certain four-velocity v’ in a certain 
frame of reference. Can we transform into a different frame in 
which the object is at rest, and its four-velocity is zero? 


> No. In general, the Lorentz transformation preserves the mag- 
nitude of vectors, so it can never transform a vector with a zero 
magnitude into one with nonzero magnitude. Since this is a ma- 
terial object (not a ray of light) we can transform into a frame in 
which the object is at rest, but an object at rest does not have a 
vanishing four-velocity. It has a four-velocity of (1,0, 0, 0). 


Example 2 suggests a nice way of thinking about velocity vectors, 
which is that every velocity vector represents a potential observer. 
An observer is a material object, and therefore has a timelike veloc- 
ity vector. This observer writes her own velocity vector as (1,0, 0,0), 
i.e., as the unit vector in the timelike direction. Often when we see 
an expression involving a velocity vector, we can interpret it as de- 
scribing a measurement taken by a specific observer. 


Orthogonality as simultaneity Example: 3 
In a space where the inner product can be negative, orthogonality 
doesn’t mean what our euclidean intuition thinks it means. For ex- 
ample, a lightlike vector can be orthogonal to itself — a situation 
that never occurs in a euclidean space. 


Suppose we have a timelike vector t and a spacelike one x. What 
would it mean for t and x to be orthogonal, with t-x = 0? Since tis 
timelike, we can make a unit vector f = t/|t| out of it, and interpret 
t as the velocity vector of some hypothetical observer. We then 
know that in that observer's frame, t is simply a unit vector along 
the time axis. It now becomes clear that x must be parallel to the 
X axis, i.e., it represents a displacement between two events that 
this observer considers to be simultaneous. 
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This is an example of the idea that expressions involving velocity 
vectors can be interpreted as measurements taken by a certain 
observer. The expression t - x = 0 can be interpreted as meaning 
that according to an observer whose world-line is tangent to t, x 
represents a relationship of simultaneity. 


The four-acceleration is found by taking a second derivative with 
respect to proper time. Its squared magnitude is only approximately 
equal to minus the squared magnitude of the Newtonian acceleration 
three-vector, in the limit of small velocities. 


Constant acceleration Example: 4 
> Suppose a spaceship moves so that the acceleration is judged 
to be the constant value a by an observer on board. Find the 
motion x(t) as measured by an observer in an inertial frame. 


> Let t stand for the ship’s proper time, and let dots indicate 
derivatives with respect to t. The ship’s velocity has magnitude 
1, s0 
Pax? act, 

An observer who is instantaneously at rest with respect to the 
ship judges is to have a four-acceleration (0, a, 0,0) (because the 
low-velocity limit applies). The observer in the (tf, x) frame agrees 
on the magnitude of this vector, so 


The solution of these differential equations is t = 3 sinh at, 
X= + cosh at, and eliminating t gives 


{ 
x=ov1+al. 


As t approaches infinity, dx / dt approaches the speed of light. 


4.2.2 The momentum four-vector 
Definition for a material particle 


If we hope to find something that plays the role of momentum 
in relativity, then the momentum three-vector probably needs to 
be generalized to some kind of four-vector. If so, then the law of 
conservation of momentum will be valid regardless of one’s frame of 
reference, which is necessary.” 


If we are to satisfy the correspondence principle then the rela- 
tivistic definition of momentum should probably look as much as 
possible like the nonrelativistic one. In subsection 4.2.1, we defined 
the velocity four-vector in the case of a particle whose dx’ is not 


?We are not guaranteed that this is the right way to proceed, since the con- 
verse is not true: some three-vectors such as the electric and magnetic fields are 
embedded in rank-2 tensors in more complicated ways than this. See section 
4.2.4, p. 136. 


Chapter 4  Tensors 


lightlike. Let’s assume for the moment that it makes sense to think 
of mass as a scalar. As with Newtonian three-vectors, multiplying 
a Lorentz scalar by a four-vector vector produces another quantity 
that transforms as a four-vector. We therefore conjecture that the 
four-momentum of a material particle can be defined as p’ = mv’, 
which in Lorentz coordinates is (my, myv!,myv2, myv?). There is 
no a priori guarantee that this is right, but it’s the most reasonable 
thing to guess. It needs to be checked against experiment, and also 
for consistency with the other parts of our theory. 


The spacelike components look like the classical momentum vec- 
tor multiplied by a factor of , the interpretation being that to an 
observer in this frame, the moving particle’s inertia is increased rel- 
ative to its value in the particle’s rest frame. Such an effect is indeed 
observed experimentally. This is why particle accelerators are so big 
and expensive. As the particle approaches the speed of light, y di- 
verges, so greater and greater forces are needed in order to produce 
the same acceleration. In relativistic scattering processes with ma- 
terial particles, we find empirically that the four-momentum we’ve 
defined is conserved, which confirms that our conjectures above are 
valid, and in particular that the quantity we’re calling m can be 
treated as a Lorentz scalar, and this is what all physicists do today. 
The reader is cautioned, however, that up until about 1950, it was 
common to use the word “mass” for the combination my (which 
is what occurs in the Lorentz-coordinate form of the momentum 
vector), while referring to m as the “rest mass.” This archaic termi- 
nology is only used today in some popular-level books and low-level 
school textbooks. 


Equivalence of mass and energy 


The momentum four-vector has locked within it the reason for 
Einstein’s famous E = mc?, which in our relativistic units becomes 
simply E = m. To see why, consider the experimentally measured 
inertia of a physical object made out of atoms. The subatomic 
particles are all moving, and many of the velocities, e.g., the ve- 
locities of the electrons, are quite relativistic. This has the effect 
of increasing the experimentally determined inertial mass of the 
whole object, by a factor of y averaged over all the particles — even 
though the masses of the individual particles are invariant Lorentz 
scalars. (This same increase must also be observed for the gravita- 
tional mass, based on the equivalence principle as verified by Edtvés 
experiments. ) 


Now if the object is heated, the velocities will increase on the 
average, resulting in a further increase in its mass. Thus, a certain 
amount of heat energy is equivalent to a certain amount of mass. 
But if heat energy contributes to mass, then the same must be true 
for other forms of energy. For example, suppose that heating leads to 
a chemical reaction, which converts some heat into electromagnetic 
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binding energy. If one joule of binding energy did not convert to 
the same amount of mass as one joule of heat, then this would allow 
the object to spontaneously change its own mass, and then by con- 
servation of momentum it would have to spontaneously change its 
own velocity, which would clearly violate the principle of relativity. 
We conclude that mass and energy are equivalent, both inertially 
and gravitationally. In relativity, neither is separately conserved; 
the conserved quantity is their sum, referred to as the mass-energy, 
E. An alternative derivation, by Einstein, is given in example 16 on 
page 135. 


Energy is the timelike component of the four-momentum 


The Lorentz transformation of a zero vector is always zero. This 
means that the momentum four-vector of a material object can’t 
equal zero in the object’s rest frame, since then it would be zero 
in all other frames as well. So for an object of mass m, let its 
momentum four-vector in its rest frame be (f(m),0,0,0), where f 
is some function that we need to determine, and f can depend only 
on m since there is no other property of the object that can be 
dynamically relevant here. Since conservation laws are additive, f 
has to be f(m) = km for some universal constant k. In where c = 1, 
k is unitless. Since we want to recover the appropriate Newtonian 
limit for massive bodies, and since v; = 1 in that limit, we need k = 
1. Transforming the momentum four-vector from the particle’s rest 
frame into some other frame, we find that the timelike component 
is no longer m. We interpret this as the relativistic mass-energy, E. 


Since the momentum four-vector was obtained from the magni- 
tude-1 velocity four-vector through multiplication by m, its squared 
magnitude p'p; is equal to the square of the particle’s mass. Writing 
p for the magnitude of the momentum three-vector, and F for the 
mass-energy, we find the useful relation m? = E? — p?. We take this 
to be the relativistic definition the mass of any particle, including 
one whose dz’ is lightlike. 


Particles traveling at c 


The definition of four-momentum as p’ = mv’ only works for 
particles that move at less than c. For those that move at c, the 
four-velocity is undefined. As we’ll see in example 6 on p. 129, this 
class of particles is exactly those that are massless. As shown on 
p. 32, the three-momentum of a light wave is given by p= E. The 
fact that this momentum is nonzero implies that for light p’ = mv’ 
represents an indeterminate form. The fact that this momentum 
equals E is consistent with our definition of mass as m? = E? — p?. 


Mass is not additive 


Since the momentum four-vector p* is additive, and our defini- 
tion of mass as p*pq depends on the vector in a nonlinear way, it 
follows that mass is not additive (even for particles that are not 
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interacting but are simply considered collectively). 


Mass of two light waves Example: 5 
Let the momentum of a certain light wave be (p;, px) = (E, E), 
and let another such wave have momentum (E£, —E). The total 
momentum is (2E,0). Thus this pair of massless particles has a 
collective mass of 2E. 


Massless particles travel at c Example: 6 
We demonstrate this by showing that if we suppose the opposite, 
then there are two different consequences, either of which would 
be physically unacceptable. 


When a particle does have a nonvanishing mass, we have 


lim |v| = lim Ipl 4 ale 
E/m—oo E/m->oo E 
Thus if we had a massless particle with |v| 4 1, its behavior 
would be different from the limiting behavior of massive particles. 
But this is physically unacceptable because then we would have 
a magic method for detecting arbitrarily small masses such as 
1Q~ 10000000000 kg We don’t actually know that the photon, for 
example, is exactly massless; see example 13 on p. 131. 


Furthermore, suppose that a massless particle had |v| < 1 in 
the frame of some observer. Then some other observer could 
be at rest relative to the particle. In such a frame, the particle’s 
three-momentum p is zero by symmetry, since there is no pre- 
ferred direction for it. Then E* = p* + m? is zero as well, so 
the particle’s entire energy-momentum four-vector is zero. But 
a four-vector that vanishes in one frame also vanishes in every 
other frame. That means we're talking about a particle that can’t 
undergo scattering, emission, or absorption, and is therefore un- 
detectable by any experiment. This is physically unacceptable 
because we don’t consider phenomena (e.g., invisible fairies) to 
be of physical interest if they are undetectable even in principle. 


Gravitational redshifts Example: 7 
Since a photon’s energy E is equivalent to a certain gravitational 
mass m, photons that rise or fall in a gravitational field must 
lose or gain energy, and this should be observed as a redshift 
or blueshift in the frequency. We expect the change in gravita- 
tional potential energy to be EAd, giving a corresponding op- 
posite change in the photon’s energy, so that AE/E = Ag. In 
metric units, this becomes AE/E = Ao /c?, and in the field near 
the Earth’s surface we have AE/E = gh/c?. This is the same 
result that was found in section 1.5.5 based only on the equiva- 
lence principle, and verified experimentally by Pound and Rebka 
as described in section 1.5.6. 


Constraints on polarization Example: 8 
We observe that electromagnetic waves are always polarized 
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transversely, never longitudinally. Such a constraint can only ap- 
ply to a wave that propagates atc. If it applied to a wave that 
propagated at less than c, we could move into a frame of refer- 
ence in which the wave was at rest. In this frame, all directions in 
space would be equivalent, and there would be no way to decide 
which directions of polarization should be permitted. For a wave 
that propagates at c, there is no frame in which the wave is at rest 
(see p. 99). 


Relativistic work-energy theorem Example: 9 
In Einstein’s original 1905 paper on relativity, he assumed without 
providing any justification that the Newtonian work-energy rela- 
tion W = Fd was valid relativistically. One way of justifying this is 
that we can construct a simple machine with a mechanical advan- 
tage A and a reduction of motion by 1/A, with these ratios being 
exact relativistically.? One can then calculate, as Einstein did, 


dp dp dx 
w= | Pax agar, ya 
which is consistent with our result for E as a function of y if we 
equate it to E(y) — E(1). 


The Dirac sea Example: 10 
A great deal of physics can be derived from the T.H. White’s 
principle that “whatever is not forbidden in compulsory” — orig- 
inally intended for ants but applied to particles by Gell-Mann. 
In quantum mechanics, any process that is not forbidden by a 
conservation law is supposed to occur. The relativistic relation 
= +,/p? + m@ has two roots, a positive one and a negative one. 
The positive-energy and negative-energy states are separated by 
a no-man’s land of width 2m, so no continuous classical process 
can lead from one side to the other. But quantum-mechanically, if 
an electron exists with energy E = +,/p2 + m2, it should be able to 
make a quantum leap into a state with EF = —\/p2 + m?, emitting 
the energy difference of 2E in the form of photons. Why doesn’t 
this happen? One explanation is that the states with E < 0 are all 
already occupied. This is the “Dirac sea,” which we now interpret 
as being full of electrons. A vacancy in the sea manifests itself as 
an antielectron. 


Massive neutrinos Example: 11 
Neutrinos were long thought to be massless, but are now believed 
to have masses in the eV range. If they had been massless, they 
would always have had to propagate at the speed of light. Al- 
though they are now thought to have mass, that mass is six or- 
ders of magnitude less than the MeV energy scale of the nuclear 
reactions in which they are produced, so all neutrinos observed 
in experiments are moving at velocities very close to the speed of 
light. 
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°For an explicit example, see bit.ly/1aUXIa8. 


No radioactive decay of massless particles Example: 12 
A photon cannot decay into an electron and a positron, y > e* + 
e-, in the absence of a charged particle to interact with. To see 
this, consider the process in the frame of reference in which the 
electron-positron pair has zero total momentum. In this frame, the 
photon must have had zero (three-)momentum, but a photon with 
zero momentum must have zero energy as well. This means that 
conservation of relativistic four-momentum has been violated: the 
timelike component of the four-momentum is the mass-energy, 
and it has increased from 0 in the initial state to at least 2mc? in 
the final state. 


To demonstrate the consistency of the theory, we can arrive at the 
same conclusion by a different method. Whenever a particle has 
a small mass (small compared to its energy, say), it must travel 
at close to c. It must therefore have a very large time dilation, 
and will take a very long time to undergo radioactive decay. In 
the limit as the mass approaches zero, the time required for the 
decay approaches infinity. Another way of saying this is that the 
rate of radioactive decay must be fixed in terms of proper time, 
but there is no such thing as proper time for a massless particle. 
Thus it is not only this specific process that is forbidden, but any 
radioactive decay process involving a massless particle. 


There are various loopholes in this argument. The question is 
investigated more thoroughly by Fiore and Modanese.* 


Massive photons Example: 13 
Continuing in the same vein as example 11, we can consider the 
possibility that the photon has some nonvanishing mass. A 2003 
experiment by Luo et al.° has placed a limit of about 10~°* kg 
on this mass. This is incredibly small, but suppose that future ex- 
perimental work using improved techniques shows that the mass 
is less than this, but actually nonzero. A naive reaction to this 
scenario is that it would shake relativity to its core, since relativity 
is based upon the assumption that the speed of light is a con- 
stant, whereas for a massive particle it need not be constant. But 
this is a misinterpretation of the role of c in relativity. As should 
be clear from the approach taken in section 2.2, c is primarily a 
geometrical property of spacetime, not a property of light. 


In reality, such a discovery would be more of a problem for parti- 
cle physicists than for relativists, as we can see by the following 
sketch of an argument. Imagine two charged particles, at rest, 
interacting via an electrical attraction. Quantum mechanics de- 


“http: //arxiv.org/abs/hep-th/9508018 

°Luo et al., “New Experimental Limit on the Photon Rest Mass with a Ro- 
tating Torsion Balance,” Phys. Rev. Lett. 90 (2003) 081801. The interpretation 
of such experiments is difficult, and this paper attracted a series of comments. A 
weaker but more universally accepted bound is 8 x 10- kg, Davis, Goldhaber, 
and Nieto, Phys. Rev. Lett. 35 (1975) 1402. 


Section 4.2 


Four-vectors 


131 


132 


scribes this as an exchange of photons. Since the particles are 
at rest, there is no source of energy, so where do we get the 
energy to make the photons? The Heisenberg uncertainty prin- 
ciple, AEAt = h, allows us to steal this energy, provided that we 
give it back within a time At. This time limit imposes a limit on 
the distance the photons can travel, but by using photons of low 
enough energy, we can make this distance limit as large as we 
like, and there is therefore no limit on the range of the force. But 
suppose that the photon has a mass. Then there is a minimum 
mass-energy mc? required in order to create a photon, the max- 
imum time is h/mc?, and the maximum range is h/mc. Refining 
these crude arguments a little, one finds that exchange of zero- 
mass particles gives a force that goes like 1/r?, while a nonzero 
mass results in e-"’/r2, where u-' = h/me. For the photon, 
the best current mass limit corresponds to n~'! > 10'! m, so the 
deviation from 1/r? would be difficult to measure in earthbound 
experiments. 


Now Gauss’s law is a specific characteristic of 1 /r? fields. It would 
be violated slightly if photons had mass. We would have to modify 
Maxwell’s equations, and it turns out® that the necessary change 
to Gauss’s law would be of the form V -E = (...)p — (...)p2®, 
where © is the electrical potential, and (...) indicates factors 
that depend on the choice of units. This tells us that ®, which 
in classical electromagnetism can only be measured in terms of 
differences between different points in space, can now be mea- 
sured in absolute terms. Gauge symmetry has been broken. But 
gauge symmetry is indispensible in creating well-behaved rela- 
tivistic field theories, and this is the reason that, in general, parti- 
cle physicists have a hard time with forces arising from the ex- 
change of massive particles. The hypothetical Higgs particle, 
which may be observed at the Large Hadron Collider in the near 
future, is essentially a mechanism for wriggling out of this difficulty 
in the case of the massive W and Z particles that are responsible 
for the weak nuclear force; the mechanism cannot, however, be 
extended to allow a massive photon. 


Dust and radiation in cosmological models Example: 14 
In cosmological models, one needs an equation of state that re- 
lates the pressure P to the mass-energy density p. The pressure 
is a Lorentz scalar. The mass-energy density is not (since mass- 
energy is just the timelike component of a particular vector), but 
in a coordinate system without any net flow of mass, we can ap- 
proximate it as one. 


The early universe was dominated by radiation. A photon ina 
box contributes a pressure on each wall that is proportional to 


®Goldhaber and Nieto, ” Terrestrial and Extraterrestrial Limits on The Pho- 


ton Mass,” Rev. Mod. Phys. 43 (1971) 277 
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|p"|, where wis a spacelike index. In thermal equilibrium, each of 
these three degrees of freedom carries an equal amount of en- 
ergy, and since momentum and energy are equal for a massless 
particle, the average momentum along each axis is equal to 5E. 
The resulting equation of state is P = 30. As the universe ex- 
panded, the wavelengths of the photons expanded in proportion 
to the stretching of the space they occupied, resulting in A « a~', 
where ais a distance scale describing the universe’s intrinsic cur- 
vature at a fixed time. Since the number density of photons is 
diluted in proportion to a~*, and the mass per photon varies as 
a-', both p and P vary as a“. 


Cosmologists refer to noninteracting, nonrelativistic materials as 
“dust,” which could mean many things, including hydrogen gas, 
actual dust, stars, galaxies, and some forms of dark matter. For 
dust, the momentum is negligible compared to the mass-energy, 
so the equation of state is P = O, regardless of p. The mass- 
energy density is dominated simply by the mass of the dust, so 
there is no red-shift scaling of the a~' type. The mass-energy 
density scales as a~°. Since this is a less steep dependence on 
a than the a+, there was a point, about a thousand years after 
the Big Bang, when matter began to dominate over radiation. At 
this point, the rate of expansion of the universe made a transition 
to a qualitatively different behavior resulting from the change in 
the equation of state. 


In the present era, the universe’s equation of state is dominated 
by neither dust nor radiation but by the cosmological constant 
(see page 318). Figure a shows the evolution of the size of the 
universe for the three different regimes. Some of the simpler 
cases are derived in sections 8.2.7 and 8.2.8, starting on page 
341. 


4.2.3 The frequency vector and the relativistic Doppler shift 


The frequency vector was introduced in example ?? on p. ??. In 
the spirit of index-gymnastics notation, frequency is to time as the 
wavenumber k = 1/. is to space, so when treating waves relativis- 
tically it is natural to conjecture that there is a four-frequency fa 
made by assembling (f,k), which behaves as a Lorentz vector. This 
is correct, since we already know that 0, transforms as a covariant 
vector, and for a scalar wave of the form A = A, exp [277 f, x] the 
partial derivative operator is identical to multiplication by 27 fo. 


As an application, consider the relativistic Doppler shift of a light 
wave. For simpicity, let’s restrict ourselves to one spatial dimension. 
For a light wave, f = k, so the frequency vector in 1+1 dimensions 
is simply (f, f). Putting this through a Lorentz transformation, we 


find 
f= (L+ of =F 
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where the second form displays more clearly the symmetic form 
of the relativistic relationship, such that interchanging the roles of 
source and observer is equivalent to flipping the sign of v. That is, 
the relativistic version only depends on the relative motion of the 
source and the observer, whereas the Newtonian one also depends 
on the source’s motion relative to the medium (i.e., relative to the 
preferred frame in which the waves have the “right” velocity). In 
Newtonian mechanics, we have f’ = (1+ v)f for a moving observer. 
Relativistically, there is also a time dilation of the oscillation of the 
source, providing an additional factor of y. 


This analysis is extended to 3+1 dimensions in problem 11. 


lves-Stilwell experiments Example: 15 
The relativistic Doppler shift differs from the nonrelativistic one by 
the time-dilation factor y, so that there is still a shift even when 
the relative motion of the source and the observer is perpen- 
dicular to the direction of propagation. This is called the trans- 
verse Doppler shift. Einstein suggested this early on as a test 
of relativity. However, such experiments are difficult to carry out 
with high precision, because they are sensitive to any error in 
the alignment of the 90-degree angle. Such experiments were 
eventually performed, with results that confirmed relativity,’ but 
one-dimensional measurements provided both the earliest tests 
of the relativistic Doppler shift and the most precise ones to date. 
The first such test was done by Ives and Stilwell in 1938, using the 
following trick. The relativistic expression Sy = ,/(1+v)/(1— v) 
for the Doppler shift has the property that S,S_, = 1, which differs 
from the nonrelativistic result of (1 + v)(1 — v) = 1— v?. One can 
therefore accelerate an ion up to a relativistic soeed, measure 
both the forward Doppler shifted frequency f and the backward 
one fy, and compute J frfp. According to relativity, this should 
exactly equal the frequency f, measured in the ion’s rest frame. 


In a particularly exquisite modern version of the lves-Stilwell idea,® 
Saathoff et al. circulated Li* ions at v = .064 in a storage ring. 
An electron-cooler technique was used in order to reduce the 
variation in velocity among ions in the beam. Since the identity 
S,S_y = 1 is independent of v, it was not necessary to mea- 
sure v to the same incredible precision as the frequencies; it was 
only necessary that it be stable and well-defined. The natural line 
width was 7 MHz, and other experimental effects broadened it fur- 
ther to 11 MHz. By curve-fitting the line, it was possible to achieve 
results good to a few tenths of a MHz. The resulting frequencies, 


"See, e.g., Hasselkamp, Mondry, and Scharmann, Zeitschrift fiir Physik A: 
Hadrons and Nuclei 289 (1979) 151. 

8G. Saathoff et al., “Improved Test of Time Dilation in Relativity,” Phys. 
Rev. Lett. 91 (2003) 190403. A publicly available description of the experiment 
is given in Saathoff’s PhD thesis, www.mpi-hd.mpg.de/ato/homes/saathoff/ 
diss-saathoff.pdf. 
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in units of MHz, were: 
ff = 582490203.44 + .09 
fp =512671442.9+0.5 


Vitfp = 546466918.6 + 0.3 
fo = 546466918.8 + 0.4 (from previous experimental work) 


The spectacular agreement with theory has made this experiment 
a lightning rod for anti-relativity kooks. 


If one is searching for small deviations from the predictions of 
special relativity, a natural place to look is at high velocities. lves- 
Stilwell experiments have been performed at velocities as high as 
0.84, and they confirm special relativity.° 


Einstein's derivation of E = mc? Example: 16 


On page 126, we showed that the celebrated E = mc? follows di- 
rectly from the form of the Lorentz transformation. An alternative 
derivation was given by Einstein in one of his classic 1905 papers 
laying out the theory of special relativity; the paper is short, and is 
reproduced in English translation on page ?? of this book. Having 
laid the groundwork of four-vectors and relativistic Doppler shifts, 
we can give an even shorter version of Einstein’s argument. The 
discussion is also streamlined by restricting the discussion to 1+1 
dimensions and by invoking photons. 


Suppose that a lantern, at rest in the lab frame, is floating weight- 
lessly in outer space, and simultaneously emits two pulses of 
light in opposite directions, each with energy E/2 and frequency 
f. By symmetry, the momentum of the pulses cancels, and the 
lantern remains at rest. An observer in motion at velocity v rel- 
ative to the lab sees the frequencies of the beams shifted to 
f' = (1+ v)yf. The effect on the energies of the beams can 
be found purely classically, by transforming the electric and mag- 
netic fields to the moving frame, but as a shortcut we can ap- 
ply the quantum-mechanical relation Epp, = hf for the energies of 
the photons making up the beams. The result is that the mov- 
ing observer finds the total energy of the beams to be not E but 
(E/2)(1 + v)y + (E/2)(1 — v)y = Ey. 

Both observers agree that the lantern had to use up some of the 
energy stored in its fuel in order to make the two pulses. But 
the moving observer says that in addition to this energy E, there 
was a further energy E(y — 1). Where could this energy have 
come from? It must have come from the kinetic energy of the 
lantern. The lantern’s velocity remained constant throughout the 
experiment, so this decrease in kinetic energy seen by the moving 
observer must have come from a decrease in the lantern’s inertial 
mass — hence the title of Einstein’s paper, “Does the inertia of a 
body depend upon its energy content?” 


°MacArthur et al., Phys. Rev. Lett. 56 (1986) 282 (1986) 
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b/Magnetism is a purely rel- 
ativistic effect. 


To figure out how much mass the lantern has lost, we have to 
decide how we can even define mass in this new context. In 
Newtonian mechanics, we had K = (1/2)mv?, and by the corre- 
spondence principle this must still hold in the low-velocity limit. 
Expanding E(y — 1) in a Taylor series, we find that it equals 
E(v?/2)+..., and in the low-velocity limit this must be the same 
as AK = (1/2)Amv?, so Am = E. Reinserting factors of c to get 
back to nonrelativistic units, we have E = Amc?. 


4.2.4 Anon-example: electric and magnetic fields 


It is fairly easy to see that the electric and magnetic fields cannot 
be the spacelike parts of two four-vectors. Consider the arrangement 
shown in figure b/1. We have two infinite trains of moving charges 
superimposed on the same line, and a single charge alongside the 
line. Even though the line charges formed by the two trains are 
moving in opposite directions, their currents don’t cancel. A nega- 
tive charge moving to the left makes a current that goes to the right, 
so in frame 1, the total current is twice that contributed by either 
line charge. 


In frame 1 the charge densities of the two line charges cancel out, 
and the electric field experienced by the lone charge is therefore zero. 
Frame 2 shows what we’d see if we were observing all this from a 
frame of reference moving along with the lone charge. Both line 
charges are in motion in both frames of reference, but in frame 
1, the line charges were moving at equal speeds, so their Lorentz 
contractions were equal, and their charge densities canceled out. In 
frame 2, however, their speeds are unequal. The positive charges 
are moving more slowly than in frame 1, so in frame 2 they are 
less contracted. The negative charges are moving more quickly, so 
their contraction is greater now. Since the charge densities don’t 
cancel, there is an electric field in frame 2, which points into the 
wire, attracting the lone charge. 


We appear to have a logical contradiction here, because an ob- 
server in frame 2 predicts that the charge will collide with the wire, 
whereas in frame 1 it looks as though it should move with constant 
velocity parallel to the wire. Experiments show that the charge does 
collide with the wire, so to maintain the Lorentz-invariance of elec- 
tromagnetism, we are forced to invent a new kind of interaction, one 
between moving charges and other moving charges, which causes the 
acceleration in frame 2. This is the magnetic interaction, and if we 
hadn’t known about it already, we would have been forced to invent 
it. That is, magnetism is a purely relativistic effect. The reason a 
relativistic effect can be strong enough to stick a magnet to a re- 
frigerator is that it breaks the delicate cancellation of the extremely 
large electrical interactions between electrically neutral objects. 


Although the example shows that the electric and magnetic fields 
do transform when we change from one frame to another, it is easy 
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to show that they do not transform as the spacelike parts of a rela- 
tivistic four-vector. This is because transformation between frames 
1 and 2 is along the axis parallel to the wire, but it affects the com- 
ponents of the fields perpendicular to the wire. The electromagnetic 
field actually transforms as a rank-2 tensor. 


4.2.5 The electromagnetic potential four-vector 


An electromagnetic quantity that does transform as a four-vector 
is the potential. On page 119, I mentioned the fact, which may or 
may not already be familiar to you, that whereas the Newtonian 
gravitational field’s polarization properties allow it to be described 
using a single scalar potential ¢ or a single vector field g = —V¢@, 
the pair of electromagnetic fields (E,B) needs a pair of potentials, 
® and A. It’s easy to see that ® can’t be a Lorentz scalar. Elec- 
tric charge q is a scalar, so if ® were a scalar as well, then the 
product g® would be a scalar. But this is equal to the energy of 
the charged particle, which is only the timelike component of the 
energy-momentum four-vector, and therefore not a Lorentz scaler 
itself. This is a contradiction, so ® is not a scalar. 


To see how to fit ® into relativity, consider the nonrelativistic 
quantum mechanical relation q@ = Af for a charged particle in a 
potential ®. Since f is the timelike component of a four-vector in 
relativity, we need ® to be the timelike component of some four 
vector, Ay. For the spacelike part of this four-vector, let’s write A, 
so that A, = (®,A). We can see by the following argument that 
this mysterious A must have something to do with the magnetic 
field. 


Consider the example of figure c from a quantum-mechanical 
point of view. The charged particle g has wave properties, but let’s 
say that it can be well approximated in this example as following a 
specific trajectory. This is like the ray approximation to wave optics. 
A light ray in classical optics follows Fermat’s principle, also known 
as the principle of least time, which states that the ray’s path from 
point A to point B is one that extremizes the optical path length 
(essentially the number of oscillations). The reason for this is that 
the ray approximation is only an approximation. The ray actually 
has some width, which we can visualize as a bundle of neighboring 
trajectories. Only if the trajectory follows Fermat’s principle will 
the interference among the neighboring paths be constructive. The 
classical optical path length is found by integrating k - ds, where k 
is the wavenumber. To make this relativistic, we need to use the 
frequency four-vector to form f,dzx?’, which can also be expressed 
as fpv?dr = 7(f —k-v)dr. If the charge is at rest and there 
are no magnetic fields, then the quantity in parentheses is f = 
E/h = (q/h)®. The correct relativistic generalization is clearly 
fy = (q/h) Ad. 


Since A»,’s spacelike part, A, results in the velocity-dependent 
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c/The charged particle  fol- 
lows a trajectory that extremizes 
J f>dx® compared to other 
nearby trajectories. _ Relativis- 
tically, the trajectory should be 
understood as a world-line in 
3+1-dimensional spacetime. 


d/The magnetic field (top) 
and vector potential (bottom) of 
a solenoid. The lower diagram is 
in the plane cutting through the 
waist of the solenoid, as indicated 
by the dashed line in the upper 
diagram. For an infinite solenoid, 
the magnetic field is uniform 
on the inside and zero on the 
outside, while the vector potential 
is proportional to r on the inside 
and to 1/r on the outside. 
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effects, we conclude that A is a kind of potential that relates to the 
magnetic field, in the same way that the potential ® relates to the 
electric field. A is known as the vector potential, and the relation 
between the potentials and the fields is 


A 
me vose™ 


ot 
B=VA. 


An excellent discussion of the vector potential from a purely classical 
point of view is given in the classic Feynman Lectures.!° Figure d 
shows an example. 


4.3 The tensor transformation laws 


We may wish to represent a vector in more than one coordinate 
system, and to convert back and forth between the two represen- 
tations. In general relativity, the transformation of the coordinates 
need not be linear, as in the Lorentz transformations; it can be any 
smooth, one-to-one function. For simplicity, however, we start by 
considering the one-dimensional case, and by assuming the coordi- 
nates are related in an affine manner, x’! = az“ + b. The addition 
of the constant b is merely a change in the choice of origin, so it 
has no effect on the components of the vector, but the dilation by 
the factor a gives a change in scale, which results in v'“ = av“ for a 
contravariant vector. In the special case where v is an infinitesimal 
displacement, this is consistent with the result found by implicit dif- 
ferentiation of the coordinate transformation. For a contravariant 
vector, v), = + Up: Generalizing to more than one dimension, and to 


LL 
a possibly nonlinear transformation, we have 


ty 

[1] gras a 
Ox" 

[2] Un = UK Beli 


Note the inversion of the partial derivative in one equation compared 
to the other. Because these equations describe a change from one 
coordinate system to another, they clearly depend on the coordinate 
system, so we use Greek indices rather than the Latin ones that 
would indicate a coordinate-independent equation. Note that the 
letter ys in these equations always appears as an index referring to 
the new coordinates, & to the old ones. For this reason, we can get 
away with dropping the primes and writing, e.g., vu“ = v*da'"/da" 
rather than v’, counting on context to show that v“ is the vector 
expressed in the new coordinates, v“ in the old ones. This becomes 
especially natural if we start working in a specific coordinate system 


The Feynman Lectures on Physics, Feynman, Leighton, and Sands, Addison 
Wesley Longman, 1970 
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where the coordinates have names. For example, if we transform 
from coordinates (t, x,y,z) to (a,b,c,d), then it is clear that vu’ is 
expressed in one system and v° in the other. 


Self-check: Recall that the gauge transformations allowed in gen- 
eral relativity are not just any coordinate transformations; they 
must be (1) smooth and (2) one-to-one. Relate both of these re- 
quirements to the features of the vector transformation laws above. 


In equation [2], 4s appears as a subscript on the left side of the 
equation, but as a superscript on the right. This would appear 
to violate our rules of notation, but the interpretation here is that 
in expressions of the form 0/0z' and 0/02;, the superscripts and 
subscripts should be understood as being turned upside-down. Sim- 
ilarly, [1] appears to have the implied sum over « written ungram- 
matically, with both «’s appearing as superscripts. Normally we 
only have implied sums in which the index appears once as a super- 
script and once as a subscript. With our new rule for interpreting 
indices on the bottom of derivatives, the implied sum is seen to be 
written correctly. This rule is similar to the one for analyzing the 
units of derivatives written in Leibniz notation, with, e.g., @ x / at? 
having units of meters per second squared. That is, the flipping of 
the indices like this is required for consistency so that everything 
will work out properly when we change our units of measurement, 
causing all our vector components to be rescaled. 


A quantity v that transforms according to [1] or [2] is referred 
to as a rank-1 tensor, which is the same thing as a vector. 


The identity transformation Example: 17 
In the case of the identity transformation x’" = x", equation [1] 
clearly gives v’ = v, since all the mixed partial derivatives 0x’"/Ox* 
with u + k are zero, and all the derivatives for k = 1 equal 1. 


In equation [2], it is tempting to write 


OX 1-4 
ay oe (wrong)), 


but this would give infinite results for the mixed terms! Only in the 
case of functions of a single variable is it possible to flip deriva- 
tives in this way; it doesn’t work for partial derivatives. To evalu- 
ate these partial derivatives, we have to invert the transformation 
(which in this example is trivial to accomplish) and then take the 
partial derivatives. 


The metric is a rank-2 tensor, and transforms analogously: 


Ox" Ox 
Guy = Ik Foe Oalv 


(writing g rather than g/ on the left, because context makes the 
distinction clear). 
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Self-check: Write the similar expressions for g’”, gl}, and Gis 


which are entirely determined by the grammatical rules for writing 
superscripts and subscripts. Interpret the case of a rank-0 tensor. 


An accelerated coordinate system? Example: 18 
Let’s see the effect on Lorentzian metric g of the transformation 


{ 
Pst o¢ =x Sar. 
+5 
The inverse transformation is 


1 
Pet eee ears: 
2 


The tensor transformation law gives 


Oy = 1 — (at’)? 
xix = —1 
Der = =P 
Clearly something bad happens at at’ = +1, when the relative 


velocity surpasses the speed of light: the t/t’ component of the 
metric vanishes and then reverses its sign. This would be physi- 
cally unreasonable if we viewed this as a transformation from ob- 
server A’s Lorentzian frame into the accelerating reference frame 
of observer B aboard a spaceship who feels a constant acceler- 
ation. Several things prevent such an interpretation: (1) B cannot 
exceed the speed of light. (2) Even before B gets to the speed 
of light, the coordinate t’ cannot correspond to B’s proper time, 
which is dilated. (3) Due to time dilation, A and B do not agree 
on the rate at which B is accelerating. If B measures her own 
acceleration to be a’, A will judge it to be a< a, witha O0asB 
approaches the speed of light. There is nothing invalid about the 
coordinate system (t’, x’), but neither does it have any physically 
interesting interpretation. 


Physically meaningful constant acceleration Example: 19 
To make a more physically meaningful version of example 18, we 
need to use the result of example 4 on page 126. The some- 
what messy derivation of the coordinate transformation is given 
by Semay.!! The result is 


t' = (x Hf ;) sinh at 
a 
Gs) 
xX = ({x+-— } coshat 
a 
Applying the tensor transformation law gives (problem 7, page 
156): 
Dry = (1 + ax’)? 
Oxrx! = = 
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‘arxiv.org/abs/physics/0601179 


Unlike the result of example 18, this one never misbehaves. 


The closely related topic of a uniform gravitational field in general 
relativity is considered in problem 7 on page 209. 


Accurate timing signals Example: 20 

The relation between the potential A and the fields E and B 
given on page 137 can be written in manifestly covariant form as 
Fi; = OjAj, where F, called the electromagnetic tensor, is an an- 
tisymmetric rank-two tensor whose six independent components 
correspond in a certain way with the components of the E and 
B three-vectors. If F vanishes completely at a certain point in 
spacetime, then the linear form of the tensor transformation laws 
guarantees that it will vanish in all coordinate systems, not just 
one. The GPS system takes advantage of this fact in the trans- 
mission of timing signals from the satellites to the users. The 
electromagnetic wave is modulated so that the bits it transmits 
are represented by phase reversals of the wave. At these phase 
reversals, F vanishes, and this vanishing holds true regardless of 
the motion of the user’s unit or its position in the earth’s gravita- 
tional field. Cf. problem 17 on p. 157. 


Momentum wants a lower index Example: 21 
In example 5 on p. 48, we saw that once we arbitrarily chose to 
write ruler measurements in Euclidean three-space as Ax? rather 
than Axa, it became natural to think of the Newtonian force three- 
vector as “wanting” to be notated with a lower index. We can 
do something similar with the momentum 3- or 4-vector. The 
Lagrangian is a relativistic scalar, and in Lagrangian mechanics 
momentum is defined by pz = OL/Ov?. The upper index in the 
denominator on the right becomes a lower index on the left by 
the same reasoning as was employed in the notation of the ten- 
sor transformation laws. Newton’s second law shows that this is 
consistent with the result of example 5 on p. 48. 


Section 4.3. The tensor transformation laws 


141 


142 


4.4 Experimental tests 


4.4.1 Universality of tensor behavior 


The techniques developed in this chapter allow us to make a vari- 
ety of new predictions that can be tested by experiment. In general, 
the mathematical treatment of all observables in relativity as ten- 
sors means that all observables must obey the same transformation 
laws. This is an extremely strict statement, because it requires that 
a wide variety of physical systems show identical behavior. For ex- 
ample, we already mentioned on page 73 the 2007 Gravity Probe 
B experiment (discussed in detail on pages 170 and 224), in which 
four gyroscopes aboard a satellite were observed to precess due to 
special- and general-relativistic effects. The gyroscopes were compli- 
cated electromechanical systems, but the predicted precession was 
entirely independent of these complications. We argued that if two 
different types of gyroscopes displayed different behaviors, then the 
resulting discrepancy would allow us to map out some mysterious 
vector field. This field would be a built-in characteristic of space- 
time (not produced by any physical objects nearby), and since all 
observables in general relativity are supposed to be tensors, the field 
would have to transform as a tensor. Let’s say that this tensor was 
of rank 1. Since the tensor transformation law is linear, a nonzero 
tensor can never be transformed into a vanishing tensor in another 
coordinate system. But by the equivalence principle, any special, 
local property of spacetime can be made to vanish by transforming 
into a free-falling frame of reference, in which the spacetime is has a 
generic Lorentzian geometry. The mysterious new field should there- 
fore vanish in such a frame. This is a contradiction, so we conclude 
that different types of gyroscopes cannot differ in their behavior. 


This is an example of a new way of stating the equivalence prin- 
ciple: there is no way to associate a preferred tensor field with space- 
time.!? 


4.4.2 Speed of light differing from c 


In a Lorentz invariant theory, we interpret c as a property of 
the underlying spacetime, not of the particles that inhabit it. One 
way in which Lorentz invariance could be violated would be if dif- 
ferent types of particles had different maximum velocities. In 1997, 
Coleman and Glashow suggested a sensitive test for such an effect.!® 


Assuming Lorentz invariance, a photon cannot decay into an 
electron and a positron, y + e* +e (example 12, page 131). 
Suppose, however, that material particles have a maximum speed 
Cm = 1, while photons have a maximum speed c, > 1. Then the pho- 
ton’s momentum four-vector, (E, E/cp) is timelike, so a frame does 


This statement of the equivalence principle, along with the others we have 
encountered, is summarized in the back of the book on page 413. 
Sarxiv.org/abs/hep-ph/9703240 
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exist in which its three-emomentum is zero. The detection of cosmic- 
ray gammas from distant sources with energies on the order of 10 
TeV puts an upper limit on the decay rate, implying c, -—1 < 10~!. 


An even more stringent limit can be put on the possibility of 
Cp < 1. When a charged particle moves through a medium at a speed 
higher than the speed of light in the medium, Cerenkov radiation 
results. If cp is less than 1, then Cerenkov radiation could be emitted 
by high-energy charged particles in a vacuum, and the particles 
would rapidly lose energy. The observation of cosmic-ray protons 
with energies ~ 10% TeV requires c, — 1 => —10~”°. 


4.4.3 Degenerate matter 


The straightforward properties of the momentum four-vector 
have surprisingly far-reaching implications for matter subject to ex- 
treme pressure, as in a star that uses up all its fuel for nuclear fusion 
and collapses. These implications were initially considered too ex- 
otic to be taken seriously by astronomers. For historical perspective, 
consider that in 1916, when Einstein published the theory of gen- 
eral relativity, the Milky Way was believed to constitute the entire 
universe; the “spiral nebulae” were believed to be inside it, rather 
than being similar objects exterior to it. The only types of stars 
whose structure was understood even vaguely were those that were 
roughly analogous to our own sun. (It was not known that nuclear 
fusion was their source of energy.) The term “white dwarf” had not 
been invented, and neutron stars were unknown. 


An ordinary, smallish star such as our own sun has enough hy- 
drogen to sustain fusion reactions for billions of years, maintaining 
an equilibrium between its gravity and the pressure of its gases. 
When the hydrogen is used up, it has to begin fusing heavier el- 
ements. This leads to a period of relatively rapid fluctuations in 
structure. Nuclear fusion proceeds up until the formation of ele- 
ments as heavy as oxygen (Z = 8), but the temperatures are not 
high enough to overcome the strong electrical repulsion of these nu- 
clei to create even heavier ones. Some matter is blown off, but finally 
nuclear reactions cease and the star collapses under the pull of its 
own gravity. 


To understand what happens in such a collapse, we have to un- 
derstand the behavior of gases under very high pressures. In gen- 
eral, a surface area A within a gas is subject to collisions in a time t 
from the n particles occupying the volume V = Avt, where v is the 
typical velocity of the particles. The resulting pressure is given by 
P ~npv/V, where p is the typical momentum. 


Nondegenerate gas: In an ordinary gas such as air, the parti- 
cles are nonrelativistic, so v = p/m, and the thermal energy 
per particle is p?/2m ~ kT, so the pressure is P ~ nkT/V. 
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Nonrelativistic, degenerate gas: When a fermionic gas is sub- 
ject to extreme pressure, the dominant effects creating pres- 
sure are quantum-mechanical. Because of the Pauli exclu- 
sion principle, the volume available to each particle is ~ V/n, 
so its wavelength is no more than ~ (V/n)!/3, leading to 
p= h/d ~ h(n/V)"/3. If the speeds of the particles are still 
nonrelativistic, then v = p/m still holds, so the pressure be- 
comes P ~ (h?/m)(n/V)9/3. 


Relativistic, degenerate gas: If the compression is strong enough 
to cause highly relativistic motion for the particles, then v * c, 
and the result is P ~ he(n/V)4/°. 


As a star with the mass of our sun collapses, it reaches a point 
at which the electrons begin to behave as a degenerate gas, and 
the collapse stops. The resulting object is called a white dwarf. A 
white dwarf should be an extremely compact body, about the size 
of the Earth. Because of its small surface area, it should emit very 
little light. In 1910, before the theoretical predictions had been 
made, Russell, Pickering, and Fleming discovered that 40 Eridani B 
had these characteristics. Russell recalled: “I knew enough about 
it, even in these paleozoic days, to realize at once that there was 
an extreme inconsistency between what we would then have called 
‘possible’ values of the surface brightness and density. I must have 
shown that I was not only puzzled but crestfallen, at this exception 
to what looked like a very pretty rule of stellar characteristics; but 
Pickering smiled upon me, and said: ‘It is just these exceptions 
that lead to an advance in our knowledge,’ and so the white dwarfs 
entered the realm of study!” 


S. Chandrasekhar showed in that 1930’s that there was an upper 
limit to the mass of a white dwarf. We will recapitulate his calcu- 
lation briefly in condensed order-of-magnitude form. The pressure 
at the core of the star is P ~ pgr ~ GM?/r+, where M is the total 
mass of the star. The star contains roughly equal numbers of neu- 
trons, protons, and electrons, so M = Knm, where m is the mass of 
the electron, n is the number of electrons, and K = 4000. For stars 
near the limit, the electrons are relativistic. Setting the pressure at 
the core equal to the degeneracy pressure of a relativistic gas, we 
find that the Chandrasekhar limit is ~ (hc/G)?/2(Km)~? = 6Mo. 
A less sloppy calculation gives something more like 1.4Mo. The self- 
consistency of this solution is investigated in homework problem 15 
on page 157. 


What happens to a star whose mass is above the Chandrasekhar 
limit? As nuclear fusion reactions flicker out, the core of the star be- 
comes a white dwarf, but once fusion ceases completely this cannot 
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be an equilibrium state. Now consider the nuclear reactions 


n>pt+e +vV 
pte >n+t+y, 


which happen due to the weak nuclear force. The first of these re- 
leases 0.8 MeV, and has a half-life of 14 minutes. This explains 
why free neutrons are not observed in significant numbers in our 
universe, e.g., in cosmic rays. The second reaction requires an input 
of 0.8 MeV of energy, so a free hydrogen atom is stable. The white 
dwarf contains fairly heavy nuclei, not individual protons, but sim- 
ilar considerations would seem to apply. A nucleus can absorb an 
electron and convert a proton into a neutron, and in this context the 
process is called electron capture. Ordinarily this process will only 
occur if the nucleus is neutron-deficient; once it reaches a neutron- 
to-proton ratio that optimizes its binding energy, neutron capture 
cannot proceed without a source of energy to make the reaction go. 
In the environment of a white dwarf, however, there is such a source. 
The annihilation of an electron opens up a hole in the “Fermi sea.” 
There is now an state into which another electron is allowed to drop 
without violating the exclusion principle, and the effect cascades 
upward. In a star with a mass above the Chandrasekhar limit, this 
process runs to completion, with every proton being converted into a 
neutron. The result is a neutron star, which is essentially an atomic 
nucleus (with Z = 0) with the mass of a star! 


Observational evidence for the existence of neutron stars came 
in 1967 with the detection by Bell and Hewish at Cambridge of a 
mysterious radio signal with a period of 1.3373011 seconds. The sig- 
nal’s observability was synchronized with the rotation of the earth 
relative to the stars, rather than with legal clock time or the earth’s 
rotation relative to the sun. This led to the conclusion that its origin 
was in space rather than on earth, and Bell and Hewish originally 
dubbed it LGM-1 for “little green men.” The discovery of a second 
signal, from a different direction in the sky, convinced them that it 
was not actually an artificial signal being generated by aliens. Bell 
published the observation as an appendix to her PhD thesis, and 
it was soon interpreted as a signal from a neutron star. Neutron 
stars can be highly magnetized, and because of this magnetization 
they may emit a directional beam of electromagnetic radiation that 
sweeps across the sky once per rotational period — the “lighthouse 
effect.” If the earth lies in the plane of the beam, a periodic signal 
can be detected, and the star is referred to as a pulsar. It is fairly 
easy to see that the short period of rotation makes it difficult to 
explain a pulsar as any kind of less exotic rotating object. In the 
approximation of Newtonian mechanics, a spherical body of density 
p, rotating with a period T = ,/3/Gp, has zero apparent gravity 
at its equator, since gravity is just strong enough to accelerate an 
object so that it follows a circular trajectory above a fixed point on 
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the surface (problem 14). In reality, astronomical bodies of plane- 
tary size and greater are held together by their own gravity, so we 
have T = 1/./Gp for any body that does not fly apart spontaneously 
due to its own rotation. In the case of the Bell-Hewish pulsar, this 
implies p > 10!° kg/m?, which is far larger than the density of nor- 
mal matter, and also 10-100 times greater than the typical density 
of a white dwarf near the Chandrasekhar limit. 


An upper limit on the mass of a neutron star can be found in a 
manner entirely analogous to the calculation of the Chandrasekhar 
limit. The only difference is that the mass of a neutron is much 
greater than the mass of an electron, and the neutrons are the only 
particles present, so there is no factor of AK. Assuming the more 
precise result of 1.4M. for the Chandrasekhar limit rather than 
our sloppy one, and ignoring the interaction of the neutrons via the 
strong nuclear force, we can infer an upper limit on the mass of a 
neutron star: 

Kme 


2 


Mn 
The theoretical uncertainties in such an estimate are fairly large. 
Tolman, Oppenheimer, and Volkoff originally estimated it in 1939 
as 0.7Mo, whereas modern estimates are more in the range of 1.5 
to 3Mo. These are significantly lower than our crude estimate of 
5Mo, mainly because the attractive nature of the strong nuclear 
force tends to pull the star toward collapse. Unambiguous results 
are presently impossible because of uncertainties in extrapolating 
the behavior of the strong force from the regime of ordinary nuclei, 
where it has been relatively well parametrized, into the exotic envi- 
ronment of a neutron star, where the density is significantly different 
and no protons are present. There are a variety of effects that may 
be difficult to anticipate or to calculate. For example, Brown and 
Bethe found in 1994! that it might be possible for the mass limit to 
be drastically revised because of the process e~ + K7~ +1, which is 
impossible in free space due to conservation of energy, but might be 
possible in a neutron star. Observationally, nearly all neutron stars 
seem to lie in a surprisingly small range of mass, between 1.3 and 
1.45Mo, but in 2010 a neutron star with a mass of 1.97 + .04 Mo 
was discovered, ruling out most neutron-star models that included 
exotic matter. 


For stars with masses above the Tolman-Oppenheimer-Volkoff 
limit, theoretical predictions become even more speculative. A va- 
riety of bizarre objects has been proposed, including black stars, 
gravastars, quark stars, boson stars, Q-balls, and electroweak stars. 


4H.A. Bethe and G.E. Brown, “Observational constraints on the maximum 
neutron star mass,” Astrophys. J. 445 (1995) L129. G.E. Brown and H.A. 
Bethe, “A Scenario for a Large Number of Low-Mass Black Holes in the Galaxy,” 
Astrophys. J. 423 (1994) 659. Both papers are available at adsabs. harvard. edu. 
™Demorest et al., arxiv.org/abs/1010.5788v1. 
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It seems likely, however, both on theoretical and observational grounds, 
that objects with masses of about 3 to 20 solar masses end up as 
black holes; see section 6.3.4. 
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4.5 Conservation laws 


4.5.1 No general conservation laws 


Some of the first tensors we discussed were mass and charge, both 
rank-0 tensors, and the rank-1 momentum tensor, which contains 
both the classical energy and the classical momentum. Physicists 
originally decided that mass, charge, energy, and momentum were 
interesting because these things were found to be conserved. This 
makes it natural to ask how conservation laws can be formulated 
in relativity. We’re used to stating conservation laws casually in 
terms of the amount of something in the whole universe, e.g., that 
classically the total amount of mass in the universe stays constant. 
Relativity does allow us to make physical models of the universe as 
a whole, so it seems as though we ought to be able to talk about 
conservation laws in relativity. 


We can’t. 


First, how do we define “stays constant?” Simultaneity isn’t 
well-defined, so we can’t just take two snapshots, call them initial 
and final, and compare the total amount of, say, electric charge in 
each snapshot. This difficulty isn’t insurmountable. As in figure 
a, we can arbitrarily pick out three-dimensional spacelike surfaces 
— one initial and one final — and integrate the charge over each 
one. A law of conservation of charge would say that no matter what 
spacelike surface we picked, the total charge on each would be the 
same. 


Next there’s the issue that the integral might diverge, especially 
if the universe was spatially infinite. For now, let’s assume a spa- 
tially finite universe. For simplicity, let’s assume that it has the 
topology of a three-sphere (see section 8.2 for reassurance that this 
isn’t physically unreasonable), and we can visualize it as a two- 
sphere. 


In the case of the momentum four-vector, what coordinate sys- 
tem would we express it in? In general, we do not even expect to 
be able to define a smooth, well-behaved coordinate system that 
covers the entire universe, and even if we did, it would not make 
sense to add a vector expressed in that coordinate system at point 
A to another vector from point B; the best we could do would be 
to parallel-transport the vectors to one point and then add them, 
but parallel transport is path-dependent. (Similar issues occur with 
angular momentum.) For this reason, let’s restrict ourselves to the 
easier case of a scalar, such as electric charge. 


But now we’re in real trouble. How would we go about actually 
measuring the total electric charge of the universe? The only way to 
do it is to measure electric fields, and then apply Gauss’s law. This 
requires us to single out some surface that we can integrate the flux 
over, as in b. This would really be a two-dimensional surface on the 
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three-sphere, but we can visualize it as a one-dimensional surface — 
a closed curve — on the two-sphere. But now suppose this curve is 
a great circle, c. If we measure a nonvanishing total flux across it, 
how do we know where the charge is? It could be on either side. 


The conclusion is that conservation laws only make sense in rela- 
tivity under very special circumstances.'® We do not have anything 
like over-arching, global principles of conservation. As an example 
of the appropriate special circumstances, section 6.2.6, p. 228 shows 
how to define conserved quantities, which behave like energy and 
momentum, for the motion of a test particle in a particular metric 
that has a certain symmetry. This is generalized on p. 266 to a 
general, global conservation law corresponding to every continuous 
symmetry of a spacetime. A weak kind of energy conservation can 
be proved as well in a general spacetime; see sec. 8.1.3. 


4.5.2. Conservation of angular momentum and frame 
dragging 


Another special case where conservation laws work is that if 
the spacetime we’re studying gets very flat at large distances from a 
small system we’re studying, then we can define a far-away boundary 
that surrounds the system, measure the flux through that bound- 
ary, and find the system’s charge. For such asymptotic flatness 
spacetimes, we can also get around the problems that crop up with 
conserved vectors, such as momentum. (Asymptotic flatness is dis- 
cussed in more detail in section 7.4.2.) If the spacetime far away 
is nearly flat, then parallel transport loses its path-dependence, so 
we can unambiguously define a notion of parallel-transporting all 
the contributions to the flux to one arbitrarily chosen point P and 
then adding them. Asymptotic flatness also allows us to define an 
approximate notion of a global Lorentz frame, so that the choice of 
P doesn’t matter. 


As an example, figure d shows a jet of matter being ejected from 
the galaxy M87 at ultrarelativistic fields. The blue color of the jet in 
the visible-light image comes from synchrotron radiation, which is 
the electromagnetic radiation emitted by relativistic charged parti- 
cles accelerated by a magnetic field. The jet is believed to be coming 
from a supermassive black hole at the center of M87. The emission 
of the jet in a particular direction suggests that the black hole is not 
spherically symmetric. It seems to have a particular axis associated 
with it. How can this be? Our sun’s spherical symmetry is broken 
by the existence of externally observable features such as sunspots 
and the equatorial bulge, but the only information we can get about 
a black hole comes from its external gravitational (and possibly elec- 
tromagnetic) fields. It appears that something about the spacetime 
metric surrounding this black hole breaks spherical symmetry, but 


'6For another argument leading to the same conclusion, see subsection 7.5.1, 
p. 288. 


d/A relativistic jet. 
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preserves symmetry about some preferred axis. What aspect of the 
initial conditions in the formation of the hole could have determined 
such an axis? The most likely candidate is the angular momentum. 
We are thus led to suspect that black holes can possess angular mo- 
mentum, that angular momentum preserves information about their 
formation, and that angular momentum is externally detectable via 
its effect on the spacetime metric. 


What would the form of such a metric be? Spherical coordinates 
in flat spacetime give a metric like this: 


ds? = dt? — dr? — r7 dé? — r’ sin? 6 dd”. 


We’ll see in chapter 6 that for a non-rotating black hole, the metric 
is of the form 


ds? = (...) dt? —(...) dr? — r? dé? — r? sin? 6 dd?, 


where (...) represents functions of r. In fact, there is nothing spe- 
cial about the metric of a black hole, at least far away; the same 
external metric applies to any spherically symmetric, non-rotating 
body, such as the moon. Now what about the metric of a rotating 
body? We expect it to have the following properties: 


1. It has terms that are odd under time-reversal, corresponding 
to reversal of the body’s angular momentum. 


2. Similarly, it has terms that are odd under reversal of the dif- 
ferential d@ of the azimuthal coordinate. 


3. The metric should have axial symmetry, i.e., it should be in- 
dependent of ¢. 


Restricting our attention to the equatorial plane 6 = 7/2, the sim- 
plest modification that has these three properties is to add a term 
of the form 


f(...)Ldg dt, 


where (...) again gives the r-dependence and L is a constant, inter- 
preted as the angular momentum. A detailed treatment is beyond 
the scope of this book, but solutions of this form to the relativistic 
field equations were found by New Zealand-born physicist Roy Kerr 
in 1963 at the University of Texas at Austin. 


The astrophysical modeling of observations like figure d is com- 
plicated, but we can see in a simplified thought experiment that if 
we want to determine the angular momentum of a rotating body 
via its gravitational field, it will be difficult unless we use a measur- 
ing process that takes advantage of the asymptotic flatness of the 
space. For example, suppose we send two beams of light past the 
earth, in its equatorial plane, one on each side, and measure their 
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deflections, e. The deflections will be different, because the sign of 
d¢ dt will be opposite for the two beams. But the entire notion of a 
“deflection” only makes sense if we have an asymptotically flat back- 
ground, as indicated by the dashed tangent lines. Also, if spacetime 
were not asymptotically flat in this example, then there might be 
no unambiguous way to determine whether the asymmetry was due 
to the earth’s rotation, to some external factor, or to some kind of 
interaction between the earth and other bodies nearby. 


It also turns out that a gyroscope in such a gravitational field 
precesses. This effect, called frame dragging, was predicted by Lense 
and Thirring in 1918, and was finally verified experimentally in 2008 
by analysis of data from the Gravity Probe B experiment, to a pre- 
cision of about 15%. The experiment was arranged so that the rela- 
tively strong geodetic effect (6.6 arc-seconds per year) and the much 
weaker Lense-Thirring effect (.041 arc-sec/yr) produced precessions 
in perpendicular directions. Again, the presence of an asymptoti- 
cally flat background was involved, because the probe measured the 
orientations of its gyroscopes relative to the guide star IM Pegasi. 


4.6 Things that aren’t quite tensors 


This section can be skipped on a first reading. 


4.6.1 Area, volume, and tensor densities 


We’ve embarked on a program of redefining every possible phys- 
ical quantity as a tensor, but so far we haven’t tackled area and 
volume. Is there, for example, an area tensor in a locally Euclidean 
plane? We are encouraged to hope that there is such a thing, be- 
cause on p. 45 we saw that we could cook up a measure of area 
with no other ingredients than the axioms of affine geometry. What 
kind of tensor would it be? The notions of vector and scalar from 
freshman mechanics are distinguished from one another by the fact 
that one has a direction in space and the other does not. Therefore 
we expect that area would be a scalar, i.e., a rank-O tensor. But 
this can’t be right, for the following reason. Under a rescaling of 
Cartesian coordinates by a factor k, area should change by a factor 
of k?. But by the tensor transformation laws, a rank-0 tensor is sup- 
posed to be invariant under a change of coordinates. We therefore 
conclude that quantities like area and volume are not tensors. 


In the language of ordinary vectors and scalars in Euclidean 
three-space, one way to express area and volume is by using dot and 
cross products. The area of the parallelogram spanned by u and v 
is measured by the area vector u x v, and similarly the volume of 
the parallelepiped formed by u, v, and w can be computed as the 
scalar triple product u-(v x w). Both of these quantities are defined 
such that interchanging two of the inputs negates the output. In 
differential geometry, we do have a scalar product, which is defined 


e/ Two light rays travel in the 
earth’s equatorial plane from A to 
B. Due to frame-dragging, the ray 
moving with the earth’s rotation 
is deflected by a greater amount 
than the one moving contrary to 
it. As a result, the figure has an 
asymmetric banana shape. Both 
the deflection and its asymmetry 
are greatly exaggerated. 
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two separate effects: geodetic 
and frame-dragging. 
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a/A Mobius strip 
orientable surface. 


is not an 


by contracting the indices of two vectors, as in u“vg. If we also had a 
a tensorial cross product, we would be able to define area and volume 
tensors, so we conclude that there is no tensorial cross product, i.e., 
an operation that would multiply two rank-1 tensors to produce a 
rank-1 tensor. Since one of the most important physical applications 
of the cross product is to calculate the angular momentum L = rxp, 
we find that angular momentum in relativity is either not a tensor 
or not a rank-1 tensor. 


When someone tells you that it’s impossible to do a seemingly 
straightforward thing, the typical response is to look for a way to get 
around the supposed limitation. In the case of a locally Euclidean 
plane, what is to stop us from making a small, standard square, and 
then sliding the square around to any desired location? If we have 
some figure whose area we wish to measure, we can then dissect it 
into squares of that size and count the number of squares. 


There are two problems with this plan, neither of which is com- 
pletely insurmountable. First, the area vector u x v is a vector, with 
its orientation specified by the direction of the normal to the surface. 
We need this orientation, for example, when we calculate the elec- 
tric flux as f E-dA. Figure a shows that we cannot always define 
such an orientation in a consistent way. When the x — y coordinate 
system is slid around the Mobius strip, it ends up with the oppo- 
site orientation. In general relativity, there is not any guarantee of 
orientability in space — or even in time! But the vast majority of 
spacetimes of physical interest are in fact orientable in every desired 
way, and even for those that aren’t, orientability still holds in any 
sufficiently small neighborhood. 


The other problem is that area has the wrong scaling properties 
to be a rank-0 tensor. We can get around this problem by being 
willing to discuss quantities that don’t transform exactly like ten- 
sors. Often we only care about transformations, such as rotations 
and translations, that don’t involve any scaling. We saw in sec- 
tion 2.2 on p. 51 that Lorentz boosts also have the special property 
of preserving area in a space-time plane containing the boost. We 
therefore define a tensor density as a quantity that transforms like 
a tensor under rotations, translations, and boosts, but that rescales 
and possibly flips its sign under other types of coordinate transfor- 
mations. In general, the additional factor comes from the determi- 
nant d of the matrix consisting of the partial derivatives Ox!" /Ox” 
(called the Jacobian matrix). This determinant is raised to a power 
W, known as the weight of the tensor density. Weight zero corre- 
sponds to the case of a real tensor. The definition of the sign of W 
is not standardized in the literature. The convention in this book is 
the one used by Carroll and Weinberg, but the opposite sign is used, 
for example, by Misner, Thorne, and Wheeler, and in the Wikipedia 
article “Tensor density.” 
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Area as a tensor density Example: 22 
In a Euclidean plane, making our rulers shorter by a factor of k 
causes the area measured in the new coordinates to increase by 
a factor of k*. The rescaling is represented by a matrix of partial 
derivatives that is simply k/, where / is the identity matrix. The 
determinant is k*. Therefore area is a tensor density of weight 
+1. 


Mass density Example: 23 
A piece of aluminum foil as a certain number of milligrams per 
square centimeter. Shrinking rulers by 1/k causes this number to 
decrease by k~?, so this mass density has W = —1. 


In Weyl’s apt characterization,!” tensors represent intensities, 
while tensor densities measure quantity. 


4.6.2 The Levi-Civita symbol 


Although there is no tensorial vector cross product, we can define 
a similar operation whose output is a tensor density. This is most 
easily expressed in terms of the Levi-Civita symbol e. (See p. 92 for 
biographical information about Levi-Civita.) 


In n dimensions, the Levi-Civita symbol has n indices. It is 
defined so as to be totally asymmetric, in the sense that if any two of 
the indices are interchanged, its sign flips. This is sufficient to define 
the symbol completely except for an over-all scaling, which is fixed 
by arbitrarily taking one of the nonvanishing elements and setting 
it to +1. To see that this is enough to define € completely, first note 
that it must vanish when any index is repeated. For example, in 
three dimensions labeled by «, A, and p, €,,, is unchanged under 
an interchange of the second and third indices, but it must also flip 
its sign under this operation, which means that it must be zero. If 
we arbitrarily fix €,,, = +1, then interchange of the second and 
third indices gives €,;,, = —1, and a further interchange of the 
first and second yields €,,. = +1. Any permutation of the three 
distinct indices can be reached from any other by a series of such 
pairwise swaps, and the number of swaps is uniquely odd or even.!® 
In Cartesian coordinates in three dimensions, it is conventional to 
choose €yyz = +1 when 2, y, and z form a right-handed spatial 
coordinate system. In four dimensions, we take €:zy, = +1 when t¢ 
is future-timelike and (x, y, z) are right-handed. 


In Euclidean three-space, in coordinates such that g = diag(1, 1,1), 
the vector cross product A = u x v, where we have in mind the in- 
terpretation of A as area, can be expressed as A, = enw u: 


Self-check: Check that this matches up with the more familiar 
definition of the vector cross product. 


Hermann Weyl, “Space-Time-Matter,” 1922, p. 109, available online at 
archive.org/details/spacetimematterO0weyluoft. 
18For a proof, see the Wikipedia article “Parity of a permutation.” 
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Now suppose that we want to generalize to curved spaces, where 
g cannot be constant. There are two ways to proceed. 


Tensorial € 


One is to let € have the values 0 and +1 at some arbitrarily 
chosen point, in some arbitrarily chosen coordinate system, but to 
let it transform like a tensor. Then A, = Eun urur needs to be 
modified, since the right-hand side is a tensor, and that would make 
A a tensor, but if A is an area we don’t want it to transform like 
a l1-tensor. We therefore need to revise the definition of area to 
be A, = gen nutur, where g is the determinant of the lower- 
index form of the metric. The following two examples justify this 
procedure in a locally Euclidean three-space. 


Scaling coordinates with tensorial « Example: 24 
Then scaling of coordinates by k scales all the elements of the 
metric by k~?, g by k~§, g~'/2 by kS, e,., by k~S, and uxv* by 
k?. The result is to scale A,, by k*?~3+? = k?, which makes sense 
if Ais an area. 


Oblique coordinates with tensorial « Example: 25 
In oblique coordinates (example 9, p. 104), the two basis vectors 
have unit length but are at an angle @ #4 71/2 to one another. The 
determinant of the metric is g = sin? @, so /g = sing, which is 
exactly the correction factor needed in order to get the right area 
when u and v are the two basis vectors. 


This procedure works more generally, the sole modification being 
that in a space such as a locally Lorentzian one where g < 0 we need 
to use \/—g as the correction factor rather than ,/g. 


Tensor-density € 


The other option is to let € have the same 0 and +1 values at 
all points. Then « is clearly not a tensor, because it doesn’t scale 
by a factor of k” when the coordinates are scaled by k; € is a tensor 
density with weight —1 for the upper-index version and +1 for the 
lower-index one. The relation A, = Gino gives an area that is 
a tensor density, not a tensor, because A is not written in terms of 
purely tensorial quantities. Scaling the coordinates by k leaves €,,.,. 
unchanged, scales up u“v* by k?, and scales up the area by k?, as 
expected. 


Unfortunately, there is no consistency in the literature as to 
whether € should be a tensor or a tensor density. Some authors 
define both a tensor and a nontensor version, with notations like 
e and €, or! €o123 and [0123]. Others avoid writing the letter € 
completely.2° The tensor-density version is convenient because we 
always know that its value is 0 or +1. The tensor version has the 


1Misner, Thorne, and Wheeler 
?0Hawking and Ellis 
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advantage that it transforms as a tensor. 


4.6.3 Spacetime volume 


We saw on p. 53 that area in the 1 + 1-dimensional plane of flat 
spacetime is preserved by a Lorentz boost. This makes sense because 
when we express the area spanned by a parallelogram with edges p 
and q as €7p,qsy, all the indices have been contracted, leaving a 
rank-O tensor density. In 3+ 1 dimensions, we have the spacetime 
volume V = €%4n, q,r-sq spanned by the paralellepiped with edges 
p, q, r, ands. A typical situation in which this volume is nonzero 
would be that in which one of the vectors is timelike and the other 
three spacelike. Let the timelike one be p. Assume |p| = 1, since 
an example with |p| 4 1 can be reduced to this by scaling. Then 
p can be interpreted as the velocity vector of some observer, and 
V as the spatial volume that the observer says is spanned by the 
3-paralellepiped with edges q, r, and s. 


4.6.4 Angular momentum 


As discussed above, angular momentum cannot be a rank-1 ten- 
sor. One approach is to define a rank-2 angular momentum tensor 
Le == rep? _ r pt. 


In a frame whose origin is instantaneously moving along with 
a certain system’s center of mass at a certain time, the time-space 
components of L vanish, and the components L¥*, L**, and L*Y 
coincide in the nonrelativistic limit with the x, y, and z components 
of the Newtonian angular momentum vector. We can also define 
a three-dimensional object L® = €qp.L?° (with three-dimensional 
tensor-density € in the spatial dimensions) that doesn’t transform 
like a tensor. 
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Problems 


1 Describe the four-velocity of a photon. 
> Solution, p. 391 


2 The Large Hadron Collider is designed to accelerate protons 
to energies of 7 TeV. Find 1 — v for such a proton. 
> Solution, p. 391 


3 Prove that an electron in a vacuum cannot absorb a photon. 
(This is the reason that the ability of materials to absorb gamma- 
rays is strongly dependent on atomic number Z. The case of Z = 0 
corresponds to the vacuum.) 


4 (a) For an object moving in a circle at constant speed, the 
dot product of the classical three-vectors v and a is zero. Give 
an interpretation in terms of the work-kinetic energy theorem. (b) 
In the case of relativistic four-vectors, v’a; = 0 for any world-line. 
Give a similar interpretation. Hint: find the rate of change of the 
four-velocity’s squared magnitude. 


5 Starting from coordinates (t,x) having a Lorentzian metric 
g, transform the metric tensor into reflected coordinates (t’, 2’) = 
(t, —x), and verify that g’ is the same as g. 


6 Starting from coordinates (t,x) having a Lorentzian metric g, 
transform the metric tensor into Lorentz-boosted coordinates (t’, x’), 
and verify that g’ is the same as g. 


7 Verify the transformation of the metric given in example 19 
on page 140. 


8 A skeptic claims that the Hafele-Keating experiment can only 
be explained correctly by relativity in a frame in which the earth’s 
axis is at rest. Prove mathematically that this is incorrect. Does it 
matter whether the frame is inertial? > Solution, p. 391 


9 Assume the metric g = diag(+1,+1,+1). Which of the fol- 
lowing correctly expresses the noncommutative property of ordinary 
matrix multiplication? 


A? By # By Asx 


10 Example 10 on page 130 introduced the Dirac sea, whose 
existence is implied by the two roots of the relativistic relation E = 
+,\/p2+m?. Prove that a Lorentz boost will never transform a 
positive-energy state into a negative-energy state. 

> Solution, p. 392 


11 On page 133, we found the relativistic Doppler shift in 1+1 
dimensions. Extend this to 3+1 dimensions, and check your result 
against the one given by Einstein on page ??. 
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> Solution, p. 392 


12 Estimate the energy contained in the electric field of an 
electron, if the electron’s radius is r. Classically (i.e., assuming 
relativity but no quantum mechanics), this energy contributes to the 
electron’s rest mass, so it must be less than the rest mass. Estimate 
the resulting lower limit on r, which is known as the classical electron 
radius. > Solution, p. 392 


13 For gamma-rays in the MeV range, the most frequent mode of 
interaction with matter is Compton scattering, in which the photon 
is scattered by an electron without being absorbed. Only part of 
the gamma’s energy is deposited, and the amount is related to the 
angle of scattering. Use conservation of four-momentum to show 
that in the case of scattering at 180 degrees, the scattered photon 
has energy E’ = F/(1+2E/m), where m is the mass of the electron. 


14 Derive the equation T = ,/3/Gp given on page 146 for the 
period of a rotating, spherical object that results in zero apparent 
gravity at its surface. 


15 Section 4.4.3 presented an estimate of the upper limit on the 
mass of a white dwarf. Check the self-consistency of the solution 
in the following respects: (1) Why is it valid to ignore the contri- 
bution of the nuclei to the degeneracy pressure? (2) Although the 
electrons are ultrarelativistic, spacetime is approximated as being 
flat. As suggested in example 14 on page 64, a reasonable order-of- 
magnitude check on this result is that we should have M/r « ¢?/G. 


16 The laws of physics in our universe imply that for bodies with 
a certain range of masses, a neutron star is the unique equilibrium 
state. Suppose we knew of the existence of neutron stars, but didn’t 
know the mass of the neutron. Infer upper and lower bounds on the 
mass of the neutron. 


17 Example 20 on p. 141 briefly introduced the electromagnetic 
potential four-vector F;;, and this implicitly defines the transforma- 
tion properties of the electric and magnetic fields under a Lorentz 
boost v. To lowest order in v, this transformation is given by 


EF’ xsE+vxB and 
Bx B-vxE. 


I’m not a historian of science, but apparently ca. 1905 people like 
Hertz believed that these were the exact transformations of the 
field.2!_ Show that this can’t be the case, because performing two 
such transformations in a row does not in general result in a trans- 
formation of the same form. > Solution, p. 392 


21Montigny and Rousseaux, arxiv.org/abs/physics/0512200. 
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18 We know of massive particles, whose velocity vectors always 
lie inside the future light cone, and massless particles, whose veloc- 
ities lie on it. In principle, we could have a third class of particles, 
called tachyons, with spacelike velocity vectors. Tachyons would 
have m? < 0, i.e., their masses would have to be imaginary. Show 
that it is possible to pick momentum four-vectors p, and pg for 
a pair of tachyons such that p; + pg = 0. This implies that the 
vacuum would be unstable with respect to spontaneous creation of 
tachyon-antitachyon pairs. 
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Chapter 5 
Curvature 


General relativity describes gravitation as a curvature of spacetime, 
with matter acting as the source of the curvature in the same way 
that electric charge acts as the source of electric fields. Our goal is 
to arrive at Ejinstein’s field equations, which relate the local intrin- 
sic curvature to the locally ambient matter in the same way that 
Gauss’s law relates the local divergence of the electric field to the 
charge density. The locality of the equations is necessary because 
relativity has no action at a distance; cause and effect propagate at 
a maximum velocity of c(= 1). 


The hard part is arriving at the right way of defining curvature. 
We've already seen that it can be tricky to distinguish intrinsic 
curvature, which is real, from extrinsic curvature, which can never 
produce observable effects. E.g., example 5 on page 96 showed that 
spheres have intrinsic curvature, while cylinders do not. The mani- 
festly intrinsic tensor notation protects us from being misled in this 
respect. If we can formulate a definition of curvature expressed using 
only tensors that are expressed without reference to any preordained 
coordinate system, then we know it is physically observable, and not 
just a superficial feature of a particular model. 


As an example, drop two rocks side by side, b. Their trajectories 
are vertical, but on a (t,x) coordinate plot rendered in the Earth’s 
frame of reference, they appear as parallel parabolas. The curva- 
ture of these parabolas is extrinsic. The Earth-fixed frame of refer- 
ence is defined by an observer who is subject to non-gravitational 
forces, and is therefore not a valid Lorentz frame. In a free-falling 
Lorentz frame (t’, x’), the two rocks are either motionless or moving 
at constant velocity in straight lines. We can therefore see that the 
curvature of world-lines in a particular coordinate system is not an 
intrinsic measure of curvature; it can arise simply from the choice 
of the coordinate system. What would indicate intrinsic curvature 
would be, for example, if geodesics that were initially parallel were 
to converge or diverge. 


Nor is the metric a measure of intrinsic curvature. In example 
19 on page 140, we found the metric for an accelerated observer to 
be 
ae = (1 a ar): Ge'e! = —1, 
where the primes indicate the accelerated observer’s frame. The fact 
that the timelike element is not equal to —1 is not an indication of 


local local cosmo- 
curvature/ =\, matter)t logical 
constant 


a/The expected structure of 
the field equations in general 
relativity. 


b/Two rocks are dropped 
side by side. The curvatures of 
their world-lines are not intrinsic. 
In a free-falling frame, both would 
appear straight. If initially parallel 
world-lines became non-parallel, 
that would be evidence of intrinsic 
curvature. 
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a/Tidal forces disrupt comet 
Shoemaker-Levy. 


intrinsic curvature. It arises only from the choice of the coordinates 
(t’, x’) defined by a frame tied to the accelerating rocket ship. 


The fact that the above metric has nonvanishing derivatives, un- 
like a constant Lorentz metric, does indicate the presence of a grav- 
itational field. However, a gravitational field is not the same thing 
as intrinsic curvature. The gravitational field seen by an observer 
aboard the ship is, by the equivalence principle, indistinguishable 
from an acceleration, and indeed the Lorentzian observer in the 
earth’s frame does describe it as arising from the ship’s accelera- 
tion, not from a gravitational field permeating all of space. Both 
observers must agree that “I got plenty of nothin’ ” — that the 
region of the universe to which they have access lacks any stars, 
neutrinos, or clouds of dust. The observer aboard the ship must de- 
scribe the gravitational field he detects as arising from some source 
very far away, perhaps a hypothetical vast sheet of lead lying billions 
of light-years aft of the ship’s deckplates. Such a hypothesis is fine, 
but it is unrelated to the structure of our hoped-for field equation, 
which is to be local in nature. 


Not only does the metric tensor not represent the gravitational 
field, but no tensor can represent it. By the equivalence princi- 
ple, any gravitational field seen by observer A can be eliminated by 
switching to the frame of a free-falling observer B who is instanta- 
neously at rest with respect to A at a certain time. The structure of 
the tensor transformation law guarantees that A and B will agree on 
whether a given tensor is zero at the point in spacetime where they 
pass by one another. Since they agree on all tensors, and disagree 
on the gravitational field, the gravitational field cannot be a tensor. 


We therefore conclude that a nonzero intrinsic curvature of the 
type that is to be included in the Einstein field equations is not 
encoded in any simple way in the metric or its first derivatives. 
Since neither the metric nor its first derivatives indicate curvature, 
we can reasonably conjecture that the curvature might be encoded 
in its second derivatives. 


Tidal curvature versus curvature caused 
by local sources 


A further complication is the need to distinguish tidal curva- 
ture from curvature caused by local sources. Figure a shows Comet 
Shoemaker-Levy, broken up into a string of fragments by Jupiter’s 
tidal forces shortly before its spectacular impact with the planet in 
1994. Immediately after each fracture, the newly separated chunks 
had almost zero velocity relative to one another, so once the comet 
finished breaking up, the fragments’ world-lines were a sheaf of 
nearly parallel lines separated by spatial distances of only 1 km. 
These initially parallel geodesics then diverged, eventually fanning 
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out to span millions of kilometers. 


If initially parallel lines lose their parallelism, that is clearly an 
indication of intrinsic curvature. We call it a measure of sectional 
curvature, because the loss of parallelism occurs within a particular 
plane, in this case the (t,x) plane represented by figure b. 


But this curvature was not caused by a local source lurking in 
among the fragments. It was caused by a distant source: Jupiter. 
We therefore see that the mere presence of sectional curvature is not 
enough to demonstrate the existence of local sources. Even the sign 
of the sectional curvature is not a reliable indication. Although this 
example showed a divergence of initially parallel geodesics, referred 
to as a negative curvature, it is also possible for tidal forces exerted 
by distant masses to create positive curvature. For example, the 
ocean tides on earth oscillate both above and below mean sea level, 
C. 


As an example that really would indicate the presence of a local 
source, we could release a cloud of test masses at rest in a spheri- 
cal shell around the earth, and allow them to drop, d. We would 
then have positive and equal sectional curvature in the t— 2, t—y, 
and t — z planes. Such an observation cannot be due to a distant 
mass. It demonstrates an over-all contraction of the volume of an 
initially parallel sheaf of geodesics, which can never be induced by 
tidal forces. The earth’s oceans, for example, do not change their 
total volume due to the tides, and this would be true even if the 
oceans were a gas rather than an incompressible fluid. It is a unique 
property of 1/ r? forces such as gravity that they conserve volume 
in this way; this is essentially a restatement of Gauss’s law in a 
vacuum. 


5.2 The stress-energy tensor 


In general, the curvature of spacetime will contain contributions 
from both tidal forces and local sources, superimposed on one an- 
other. To develop the right formulation for the Einstein field equa- 
tions, we need to eliminate the tidal part. Roughly speaking, we 
will do this by averaging the sectional curvature over all three of the 
planes t—2, t—y, and t— z, giving a measure of curvature called the 
Ricci curvature. The “roughly speaking” is because such a prescrip- 
tion would treat the time and space coordinates in an extremely 
asymmetric manner, which would violate local Lorentz invariance. 


To get an idea of how this would work, let’s compare with the 
Newtonian case, where there really is an asymmetry between the 
treatment of time and space. In the Cartan curved-spacetime the- 
ory of Newtonian gravity (page 41), the field equation has a kind of 
scalar Ricci curvature on one side, and on the other side is the den- 
sity of mass, which is also a scalar. In relativity, however, the source 


—\»> X 


b/ Tidal forces cause the _ini- 
tially parallel world-lines of the 
fragments to diverge. The space- 
time occupied by the comet has 
intrinsic curvature, but it is not 
caused by any local mass; it is 
caused by the distant mass of 
Jupiter. 


c/The moon's _ gravitational 
field causes the Earth’s oceans to 
be distorted into an ellipsoid. The 
sign of the sectional curvature is 
negative in the x — ft plane, but 
positive in the y — ¢ plane. 


d/A cloud of test masses is 
released at rest in a spherical 
shell around the earth, shown 
here as a circle because the z 
axis is omitted. The volume of 
the shell contracts over time, 
which demonstrates that the 
local curvature of spacetime is 
generated by a local source — 
the earth — rather than some 
distant one. 
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a/This curve has no_ intrin- 
sic curvature. 


b/A surveyor on a mountaintop 
uses a heliotrope. 


c/A map of a_ triangulation 
survey such as the one Gauss 
carried out. By measuring the 
interior angles of the triangles, 
one can determine not just the 
two-dimensional projection of 
the grid but its complete three- 
dimensional form, including both 
the curvature of the earth (note 
the curvature of the lines of lat- 
itude) and the height of features 
above and below sea level. 


term in the equation clearly cannot be the scalar mass density. We 
know that mass and energy are equivalent in relativity, so for exam- 
ple the curvature of spacetime around the earth depends not just 
on the mass of its atoms but also on all the other forms of energy it 
contains, such as thermal energy and electromagnetic and nuclear 
binding energy. Can the source term in the Einstein field equations 
therefore be the mass-energy E’? No, because EF is merely the time- 
like component of a particle’s momentum four-vector. To single it 
out would violate Lorentz invariance just as much as an asymmetric 
treatment of time and space in constructing a Ricci measure of cur- 
vature. To get a properly Lorentz invariant theory, we need to find a 
way to formulate everything in terms of tensor equations that make 
no explicit reference to coordinates. The proper generalization of 
the Newtonian mass density in relativity is the stress-energy tensor 
TJ, whose 16 elements measure the local density of mass-energy 
and momentum, and also the rate of transport of these quantities 
in various directions. If we happen to be able to find a frame of 
reference in which the local matter is all at rest, then T™ represents 
the mass density. The reason for the word “stress” in the name is 
that, for example, the flux of z-momentum in the x direction is a 
measure of pressure. 


For the purposes of the present discussion, it’s not necessary to 
introduce the explicit definition of T’; the point is merely that we 
should expect the Einstein field equations to be tensor equations, 
which tells us that the definition of curvature we’re seeking clearly 
has to be a rank-2 tensor, not a scalar. The implications in four- 
dimensional spacetime are fairly complex. We’ll end up with a rank- 
4 tensor that measures the sectional curvature, and a rank-2 Ricci 
tensor derived from it that averages away the tidal effects. The 
Einstein field equations then relate the Ricci tensor to the energy- 
momentum tensor in a certain way. The stress-energy tensor is 
discussed further in section 8.1.2 on page 295. 


5.3 Curvature in two spacelike dimensions 


Since the curvature tensors in 3+1 dimensions are complicated, let’s 
start by considering lower dimensions. In one dimension, a, there 
is no such thing as intrinsic curvature. This is because curvature 
describes the failure of parallelism to behave as in E5, but there is 
no notion of parallelism in one dimension. 


The lowest interesting dimension is therefore two, and this case 
was studied by Carl Friedrich Gauss in the early nineteenth century. 
Gauss ran a geodesic survey of the state of Hanover, inventing an 
optical surveying instrument called a heliotrope that in effect was 
used to cover the Earth’s surface with a triangular mesh of light 
rays. If one of the mesh points lies, for example, at the peak of a 
mountain, then the sum 0 of the angles of the vertices meeting at 
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that point will be less than 27, in contradiction to Euclid. Although 
the light rays do travel through the air above the dirt, we can think 
of them as approximations to geodesics painted directly on the dirt, 
which would be intrinsic rather than extrinsic. The angular defect 
around a vertex now vanishes, because the space is locally Euclidean, 
but we now pick up a different kind of angular defect, which is that 
the interior angles of a triangle no longer add up to the Euclidean 
value of 7. 


A polygonal survey of a soccer ball Example: 1 
Figure d applies similar ideas to a soccer ball, the only difference 
being the use of pentagons and hexagons rather than triangles. 


In d/1, the survey is extrinsic, because the lines pass below the 
surface of the sphere. The curvature is detectable because the 
angles at each vertex add up to 120 + 120 + 110 = 350 degrees, 
giving an angular defect of 10 degrees. 


In d/2, the lines have been projected to form arcs of great circles 
on the surface of the sphere. Because the space is locally Eu- 
clidean, the sum of the angles at a vertex has its Euclidean value 
of 360 degrees. The curvature can be detected, however, be- 
cause the sum of the internal angles of a polygon is greater than 
the Euclidean value. For example, each spherical hexagon gives 
a sum of 6 x 124.31 degrees, rather than the Euclidean 6 x 120. 
The angular defect of 6 x 4.31 degrees is an intrinsic measure of 
curvature. 


Angular defect on the earth’s surface Example: 2 
Divide the Earth’s northern hemisphere into four octants, with 
their boundaries running through the north pole. These octants 
have sides that are geodesics, so they are equilateral triangles. 
Assuming Euclidean geometry, the interior angles of an equilat- 
eral triangle are each equal to 60 degrees, and, as with any tri- 
angle, they add up to 180 degrees. The octant-triangle in figure 
e has angles that are each 90 degrees, and the sum is 270. This 
shows that the Earth’s surface has intrinsic curvature. 


This example suggests another way of measuring intrinsic curva- 
ture, in terms of the ratio C/r of the circumference of a circle to 
its radius. In Euclidean geometry, this ratio equals 27. Let p be 
the radius of the Earth, and consider the equator to be a circle 
centered on the north pole, so that its radius is the length of one 
of the sides of the triangle in figure e, r = (7/2)p. (Don’t confuse 
r, which is intrinsic, with p, the radius of the sphere, which is ex- 
trinsic and not equal to r.) Then the ratio C/r is equal to 4, which 
is smaller than the Euclidean value of 27. 


Let € = 6 — 7 be the angular defect of a triangle, and for 
concreteness let the triangle be in a space with an elliptic geometry, 
so that it has constant curvature and can be modeled as a sphere of 
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d/ Example 1. 


e / Example 2. 


163 


f/ Proof that the angular defect 
of a triangle in elliptic geometry 
is proportional to its area. Each 
white circle represents the en- 
tire elliptic plane. The dashed 
line at the edge is not really a 
boundary; lines that go off the 
edge simply wrap back around. 
In the spherical model, the white 
circle corresponds to one hemi- 
sphere, which is identified with 
the opposite hemisphere. 


coordi- 


normal 
nates on a sphere. 


g / Gaussian 


radius p, with antipodal points identified. 


Self-check: In elliptic geometry, what is the minimum possible 
value of the quantity C/r discussed in example 2? How does this 
differ from the case of spherical geometry? 


We want a measure of curvature that is local, but if our space 
is locally flat, we must have e€ > 0 as the size of the triangles ap- 
proaches zero. This is why Euclidean geometry is a good approx- 
imation for small-scale maps of the earth. The discrete nature of 
the triangular mesh is just an artifact of the definition, so we want 
a measure of curvature that, unlike €, approaches some finite limit 
as the scale of the triangles approaches zero. Should we expect this 
scaling to go as € x p? p*? Let’s determine the scaling. First 
we prove a classic lemma by Gauss, concerning a slightly different 
version of the angular defect, for a single triangle. 


Theorem: In elliptic geometry, the angular defect € = a+G+y—7 
of a triangle is proportional to its area A. 
Proof: By axiom E2, extend each side of the triangle to form a line, 
figure f/1. Each pair of lines crosses at only one point (E1) and 
divides the plane into two lunes with their four vertices touching at 
this point, figure f/2. Of the six lunes, we focus on the three shaded 
ones, which overlap the triangle. In each of these, the two interior 
angles at the vertex are the same (Euclid 1.15). The area of a lune 
is proportional to its interior angle, as follows from dissection into 
narrower lunes; since a lune with an interior angle of 7 covers the 
entire area P of the plane, the constant of proportionality is P/7. 
The sum of the areas of the three lunes is (P/7)(a + 6+), but 
these three areas also cover the entire plane, overlapping three times 
on the given triangle, and therefore their sum also equals P + 2A. 
Equating the two expressions leads to the desired result. 


This calculation was purely intrinsic, because it made no use of 
any model or coordinates. We can therefore construct a measure 
of curvature that we can be assured is intrinsic, AK = €/A. This is 
called the Gaussian curvature, and in elliptic geometry it is constant 
rather than varying from point to point. In the model on a sphere 
of radius p, we have K = 1/p?. 


Self-check: Verify the equation K = 1/p? by considering a tri- 
angle covering one octant of the sphere, as in example 2. 


It is useful to introduce normal or Gaussian normal coordi- 
nates, defined as follows. Through point O, construct perpendicular 
geodesics, and define affine coordinates x and y along these. For 
any point P off the axis, define coordinates by constructing the lines 
through P that cross the axes perpendicularly. For P in a suffi- 
ciently small neighborhood of O, these lines exist and are uniquely 
determined. Gaussian polar coordinates can be defined in a similar 
way. 
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Here are two useful interpretations of K. 


1. The Gaussian curvature measures the failure of parallelism in 
the following sense. Let line be constructed so that it crosses the 
normal y axis at (0,dy) at an angle that differs from perpendicular 
by the infinitesimal amount da (figure h). Construct the line 2’ = 
dx, and let da’ be the angle its perpendicular forms with @. Then! 
the Gaussian curvature at O is 


x-22 
dx dy 


where @& a = da’ — da. 


2. From a point P, emit a fan of rays at angles filling a certain 
range @ of angles in Gaussian polar coordinates (figure i). Let the 
arc length of this fan at r be L, which may not be equal to its 
Euclidean value Lz = ré. Then? 


& (iL 
a) 


Let’s now generalize beyond elliptic geometry. Consider a space 
modeled by a surface embedded in three dimensions, with geodesics 
defined as curves of extremal length, i.e., the curves made by a piece 
of string stretched taut across the surface. At a particular point 
P, we can always pick a coordinate system (x,y,z) such that the 
surface z = kyu? + skoy? locally approximates the surface to the 
level of precision needed in order to discuss curvature. The surface 
is either paraboloidal or hyperboloidal (a saddle), depending on the 
signs of k, and kg. We might naively think that k,; and k2 could be 
independently determined by intrinsic measurements, but as we’ve 
seen in example 5 on page 96, a cylinder is locally indistinguishable 
from a Euclidean plane, so if one k is zero, the other k clearly cannot 
be determined. In fact all that can be measured is the Gaussian 
curvature, which equals the product k,k2. To see why this should 
be true, first consider that any measure of curvature has units of 
inverse distance squared, and the k’s have units of inverse distance. 
The only possible intrinsic measures of curvature based on the k’s 
are therefore k? +k and kik. (We can’t have, for example, just k?, 
because that would change under an extrinsic rotation about the z 
axis.) Only k,k2 vanishes on a cylinder, so it is the only possible 
intrinsic curvature. 


Proof: Since any two lines cross in elliptic geometry, £ crosses the x axis. The 
corollary then follows by application of the definition of the Gaussian curvature 
to the right triangles formed by @, the x axis, and the lines at x = 0 and x = dz, 
so that K = de/dA = d@a/daxdy, where third powers of infinitesimals have 
been discarded. 

?In the spherical model, L = p@sinu, where u is the angle subtended at the 
center of the sphere by an arc of length r. We then have L/Lg = sinu/u, whose 
second derivative with respect to u is —1/3. Since r = pu, the second derivative 
of the same quantity with respect to r equals —1/3p? = —K/3. 


(0,dy) 


h/1. 


(dx,0) 


Gaussian curvature 


can be interpreted as the failure 
of parallelism represented by 
of «/dxdy. 


i/2. 
L+¥ro. 


Gaussian curvature as 


Section 5.3 Curvature in two spacelike dimensions 165 


j/A triangle in a space with 
negative curvature has angles 
that add to less than 7. 


k/A flea on the football can- 
not orient himself by intrinsic, 
local measurements. 


Eating pizza —  EXample: 3 
When people eat pizza by folding the slice lengthwise, they are 
taking advantage of the intrinsic nature of the Gaussian curva- 
ture. Once k, is fixed to a nonzero value, ka can’t change without 
varying K, so the slice can’t droop. 


‘Elliptic and hyperbolic geometry Example: 4 
We've seen that figures behaving according to the axioms of el- 
liptic geometry can be modeled on part of a sphere, which is a 
surface of constant K > 0. The model can be made into global 
one satisfying all the axioms if the appropriate topological prop- 
erties are ensured by identifying antipodal points. A paraboloidal 
surface z = k,x* + key* can be a good local approximation to 
a sphere, but for points far from its apex, K varies significantly. 
Elliptic geometry has no parallels; all lines meet if extended far 
enough. 


A space of constant negative curvature has a geometry called hy- 
perbolic, and is of some interest because it appears to be the one 
that describes the spatial dimensions of our universe on a cosmo- 
logical scale. A hyperboloidal surface works locally as a model, 
but its curvature is only approximately constant; the surface of 
constant curvature is a horn-shaped one created by revolving a 
mountain-shaped curve called a tractrix about its axis. The trac- 
trix of revolution is not as satisfactory a model as the sphere is 
for elliptic geometry, because lines are cut off at the cusp of the 
horn. Hyperbolic geometry is richer in parallels than Euclidean 
geometry; given a line @ and a point P not on @, there are infinitely 
many lines through P that do not pass through £. 


‘A flea on a football Example: 5 
We might imagine that a flea on the surface of an American foot- 
ball could determine by intrinsic, local measurements which di- 
rection to go in order to get to the nearest tip. This is impossible, 
because the flea would have to determine a vector, and curvature 
cannot be a vector, since z = 5k,x* + Skoy? is invariant under the 
parity inversion x + —x, y ~ —y. Forsimilar reasons, a measure 
of curvature can never have odd rank. 


Without violating reflection symmetry, it is still conceivable that the 
flea could determine the orientation of the tip-to-tip line running 
through his position. Surprisingly, even this is impossible. The 
flea can only measure the single number K, which carries no 
information about directions in space. 
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The lightning rod Example: 6 
Suppose you have a pear-shaped conductor like the one in figure 
I/1. Since the pear is a conductor, there are free charges every- 
where inside it. Panels 1 and 2 of the figure show a computer sim- 
ulation with 100 identical electric charges. In 1, the charges are 
released at random positions inside the pear. Repulsion causes 
them all to fly outward onto the surface and then settle down into 
an orderly but nonuniform pattern. 


We might not have been able to guess the pattern in advance, but 
we can verify that some of its features make sense. For example, 
charge A has more neighbors on the right than on the left, which 
would tend to make it accelerate off to the left. But when we 
look at the picture as a whole, it appears reasonable that this is 
prevented by the larger number of more distant charges on its left 
than on its right. 


There also seems to be a pattern to the nonuniformity: the charges 
collect more densely in areas like B, where the Gaussian curva- 
ture is large, and less densely in areas like C, where K is nearly 
zero (slightly negative). 


To understand the reason for this pattern, consider 1/3. It’s straight- 
forward to show that the density of charge o on each sphere is 
inversely proportional to its radius, or proportional to K'/2. Lord 


Kelvin proved that on a conducting ellipsoid, the density of charge |/ Example 6. In 1 and 2, 
is proportional to the distance from the center to the tangent charges that are visible on the 
plane, which is equivalent? to o « K‘/*; this result looks simi- front surface of the conductor are 
lar except for the different exponent. McAllister showed in 19904 shown as solid dots; the others 


would have to be seen through 
the conductor, which we imagine 
is semi-transparent. 


that this K'/+ behavior applies to a certain class of examples, but 
it clearly can’t apply in all cases, since, for example, K could be 
negative, or we could have a deep concavity, which would form 
a Faraday cage. Problem 13 on p. 211 discusses the case of a 
knife-edge. 


Similar reasoning shows why Benjamin Franklin used a sharp tip 
when he invented the lightning rod. The charged stormclouds in- 
duce positive and negative charges to move to opposite ends of 
the rod. At the pointed upper end of the rod, the charge tends 
to concentrate at the point, and this charge attracts the lightning. 
The same effect can sometimes be seen when a scrap of alu- 
minum foil is inadvertently put in a microwave oven. Modern ex- 
periments’ show that although a sharp tip is best at starting a 
spark, a more moderate curve, like the right-hand tip of the pear 
in this example, is better at successfully sustaining the spark for 
long enough to connect a discharge to the clouds. 


3http://math. stackexchange .com/questions/112662/ 
gaussian-curvature-of-an-ellipsoid-proportional-to-fourth-power-of-the-distance 

“TW McAllister 1990 J. Phys. D: Appl. Phys. 23 359 

>Moore et al., Journal of Applied Meteorology 39 (1999) 593 
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a/ The definition of the Riemann 
tensor. The vector v? changes 
by dv’ when parallel-transported 
around the approximate parallel- 
ogram. (v® is drawn on a scale 
that makes its length comparable 
to the infinitesimals dp°, dq%, and 
dv®; in reality, its size would be 
greater than theirs by an infinite 
factor.) 


5.4 Curvature tensors 


The example of the flea suggests that if we want to express curvature 
as a tensor, it should have even rank. Also, in a coordinate system 
in which the coordinates have units of distance (they are not angles, 
for instance, as in spherical coordinates), we expect that the units 
of curvature will always be inverse distance squared. Another way 
of putting this is that if we start with normal coordinates and then 
rescale all the coordinates by a factor of jz, a curvature tensor should 
scale down by p:~?. (See section 5.11, p. 202, for more on this topic.) 


Combining these two facts, we find that a curvature tensor should 
have one of the forms Rap, R%,.q, -.-, i-e., the number of lower in- 
dices should be two greater than the number of upper indices. The 
following definition has this property, and is equivalent to the earlier 
definitions of the Gaussian curvature that were not written in tensor 
notation. 


Definition of the Riemann curvature tensor: Let dp° and dq@ 
be two infinitesimal vectors, and use them to form a quadrilateral 
that is a good approximation to a parallelogram.® Parallel-transport 
vector v? all the way around the parallelogram. When it comes back 
to its starting place, it has a new value v? > v? + dv’. Then the 
Riemann curvature tensor is defined as the tensor that computes du" 
according to du% = Bou" dp° dq@. (There is no standardization in 
the literature of the order of the indices.) 


A symmetry of the Riemann tensor Example: 7 
If vectors dp® and dq lie along the same line, then dv? must van- 
ish, and interchanging dp° and dq? simply reverses the direction 
of the circuit around the quadrilateral, giving dv? — —dv?. This 
shows that R4,,, must be antisymmetric under interchange of the 
indices c and d, R44 = —Rya¢- 


In local normal coordinates, the interpretation of the Riemann 
tensor becomes particularly transparent. The constant-coordinate 
lines are geodesics, so when the vector v? is transported along them, 
it maintains a constant angle with respect to them. Any rotation 
of the vector after it is brought around the perimeter of the quadri- 
lateral can therefore be attributed to something that happens at 
the vertices. In other words, it is simply a measure of the angular 
defect. We can therefore see that the Riemann tensor is really just 
a tensorial way of writing the Gaussian curvature K = de/dA. 


In normal coordinates, the local geometry is nearly Cartesian, 
and when we take the product of two vectors in an antisymmetric 
manner, we are essentially measuring the area of the parallelogram 
they span, as in the three-dimensional vector cross product. We can 
therefore see that the Riemann tensor tells us something about the 
amount of curvature contained within the infinitesimal area spanned 


°Section 5.8 discusses the sense in which this approximation is good enough. 
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by dp® and dq?. A finite two-dimensional region can be broken 
down into infinitesimal elements of area, and the Riemann tensor 
integrated over them. The result is equal to the finite change Av? 
in a vector transported around the whole boundary of the region. 


Curvature tensors on a sphere Example: 8 
Let’s find the curvature tensors on a sphere of radius p. 


Construct normal coordinates (x, y) with origin O, and let vec- 
tors dp° and dq? represent infinitesimal displacements along x 
and y, forming a quadrilateral as described above. Then R%,,, 
represents the change in the x direction that occurs in a vector 
that is initially in the y direction. If the vector has unit magni- 
tude, then R%,,,, equals the angular deficit of the quadrilateral. 
Comparing with the definition of the Gaussian curvature, we find 
Pry =K= 1/p?. Interchanging x and y, we find the same result 
for Rye Thus although the Riemann tensor in two dimensions 
has sixteen components, only these two are nonzero, and they 


are equal to each other. 


This result represents the defect in parallel transport around a 
closed loop per unit area. Suppose we parallel-transport a vector 
around an octant, as shown in figure b. The area of the octant 
is (7/2)p*, and multiplying it by the Riemann tensor, we find that 
the defect in parallel transport is 71/2, i.e., a right angle, as is also 
evident from the figure. 


The above treatment may be somewhat misleading in that it may 
lead you to believe that there is a single coordinate system in 
which the Riemann tensor is always constant. This is not the 
case, since the calculation of the Riemann tensor was only valid 
near the origin O of the normal coordinates. The character of 
these coordinates becomes quite complicated far from O; we end 
up with all our constant-x lines converging at north and south 
poles of the sphere, and all the constant-y lines at east and west 
poles. 


Angular coordinates (cd, 8) are more suitable as a large-scale de- 
scription of the sphere. We can use the tensor transformation law 
to find the Riemann tensor in these coordinates. If O, the origin 
of the (x, y) coordinates, is at coordinates (, 8), then dx/dd = 
psin®@ and dy/dé = p. The result is cere = RX yy (dy/ de)? = 1 
and Foxe = Aexyx(dx/ do)? = sin? 6. The variation in PR vag iS 
not due to any variation in the sphere’s intrinsic curvature; it rep- 
resents the behavior of the coordinate system. 


The Riemann tensor only measures curvature within a particular 
plane, the one defined by dp* and dq‘, so it is a kind of sectional cur- 
vature. Since we’re currently working in two dimensions, however, 
there is only one plane, and no real distinction between sectional 
curvature and Ricci curvature, which is the average of the sectional 


E 


b/The change in the vector 
due to parallel transport around 
the octant equals the integral 
of the Riemann tensor over the 
interior. 
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a/The geodetic effect 
measured by Gravity Probe B. 


as 


curvature over all planes that include dq?: Rea = R* aq: The Ricci 
curvature in two spacelike dimensions, expressed in normal coordi- 
nates, is simply the diagonal matrix diag(K, Kk). 


5.5 Some order-of-magnitude estimates 


As a general proposition, calculating an order-of-magnitude estimate 
of a physical effect requires an understanding of 50% of the physics, 
while an exact calculation requires about 75%.’ We’ve reached 
the point where it’s reasonable to attempt a variety of order-of- 
magnitude estimates. 


5.5.1 The geodetic effect 


How could we confirm experimentally that parallel transport 
around a closed path can cause a vector to rotate? The rotation 
is related to the amount of spacetime curvature contained within 
the path, so it would make sense to choose a loop going around 
a gravitating body. The rotation is a purely relativistic effect, so 
we expect it to be small. To make it easier to detect, we should 
go around the loop many times, causing the effect to accumulate. 
This is essentially a description of a body orbiting another body. A 
gyroscope aboard the orbiting body is expected to precess. This is 
known as the geodetic effect. In 1916, shortly after Einstein pub- 
lished the general theory of relativity, Willem de Sitter calculated 
the effect on the earth-moon system. The effect was not directly 
verified until the 1980’s, and the first high-precision measurement 
was in 2007, from analysis of the results collected by the Gravity 
Probe B satellite experiment. The probe carried four gyroscopes 
made of quartz, which were the most perfect spheres ever manu- 
factured, varying from sphericity by no more than about 40 atoms. 


Let’s estimate the size of the effect. The first derivative of the 
metric is, roughly, the gravitational field, whereas the second deriva- 
tive has to do with curvature. The curvature of spacetime around 
the earth should therefore vary as GMr~*, where M is the earth’s 
mass and G is the gravitational constant. The area enclosed by a 
circular orbit is proportional to r?, so we expect the geodetic effect 
to vary as nGM/r, where n is the number of orbits. The angle of 
precession is unitless, and the only way to make this result unitless 
is to put in a factor of 1/c?. In units with ¢ = 1, this factor is un- 
necessary. In ordinary metric units, the 1/c? makes sense, because 
it causes the purely relativistic effect to come out to be small. The 
result, up to unitless factors that we didn’t pretend to find, is 


nGM 


Cr” 


Aé ~ 


"This statement is itself only a rough estimate. Anyone who has taught 
physics knows that students will often calculate an effect exactly while not un- 
derstanding the underlying physics at all. 
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We might also expect a Thomas precession. Like the spacetime 
curvature effect, it would be proportional to nGM/c?r. Since we’re 
not worrying about unitless factors, we can just lump the Thomas 
precession together with the effect already calculated. 


The data for Gravity Probe B are r = re+(650 km) and n & 5000 
(orbiting once every 90 minutes for the 353-day duration of the 
experiment), giving AO ~ 3 x 107° radians. Figure b shows the 
actual results® the four gyroscopes aboard the probe. The precession 
was about 6 arc-seconds, or 3 x 107° radians. Our crude estimate 
was on the right order of magnitude. The missing unitless factor on 
the right-hand side of the equation above is 37, which brings the two 
results into fairly close quantitative agreement. The full derivation, 
including the factor of 37, is given on page 224. 


: Gyro 1: NS Inertial Orientation 


_ Gyro 3: NS Inertial Orientation 
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- Gyro 2: NS Inertial Orientation — 


arcsec 
arcsec 


10/30 12/19 02/07 0329 05/18 07/07 10/30 


07/07 


b/Precession angle as a function of time as measured by the four gyroscopes aboard Gravity Probe B. 


5.5.2 Deflection of light rays 


In the discussion of the momentum four vector in section 4.2.2, 
we saw that due to the equivalence principle, light must be affected 
by gravity. There are two ways in which such an effect could occur. 
Light can gain and lose momentum as it travels up and down in 
a gravitational field, or its momentum vector can be deflected by 
a transverse gravitational field. As an example of the latter, a ray 
of starlight can be deflected by the sun’s gravity, causing the star’s 
apparent position in the sky to be shifted. The detection of this 
effect was one of the first experimental tests of general relativity. 
Ordinarily the bright light from the sun would make it impossible 
to accurately measure a star’s location on the celestial sphere, but 
this problem was sidestepped by Arthur Eddington during an eclipse 
of the sun in 1919. 


Let’s estimate the size of this effect. We’ve already seen that 


Sarxiv.org/abs/1105.3456 
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c/One of the photos from Ed- 
dington’s observations of the 
1919 eclipse. This is a pho- 
tographic negative, so the cir- 
cle that appears bright is actually 
the dark face of the moon, and 
the dark area is really the bright 
corona of the sun. The stars, 
marked by lines above and be- 
low them, appeared at positions 
slightly different than their nor- 
mal ones, indicating that their light 
had been bent by the sun’s gravity 
on its way to our planet. 


the Riemann tensor is essentially just a tensorial way of writing 
the Gaussian curvature K = de/dA. Suppose, for the sake of this 
rough estimate, that the sun, earth, and star form a non-Euclidean 
triangle with a right angle at the sun. Then the angular deflection 
is the same as the angular defect ¢€ of this triangle, and equals the 
integral of the curvature over the interior of the triangle. Ignoring 
unitless constants, this ends up being exactly the same calculation 
as in section 5.5.1, and the result is e ~ GM/c?r, where r is the 
light ray’s distance of closest approach to the sun. The value of r 
can’t be less than the radius of the sun, so the maximum size of the 
effect is on the order of GM/c?r, where M is the sun’s mass, and r 
is its radius. We find € ~ 10~° radians, or about a second of arc. To 
measure a star’s position to within an arc second was well within 
the state of the art in 1919, under good conditions in a comfortable 
observatory. This observation, however, required that Eddington’s 
team travel to the island of Principe, off the coast of West Africa. 
The weather was cloudy, and only during the last 10 seconds of the 
seven-minute eclipse did the sky clear enough to allow photographic 
plates to be taken of the Hyades star cluster against the background 
of the eclipse-darkened sky. The observed deflection was 1.6 seconds 
of arc, in agreement with the relativistic prediction. The relativistic 
prediction is derived on page 233. 


5.6 The covariant derivative 


In the preceding section we were able to estimate a nontrivial general 
relativistic effect, the geodetic precession of the gyroscopes aboard 
Gravity Probe B, up to a unitless constant 37. Let’s think about 
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what additional machinery would be needed in order to carry out 
the calculation in detail, including the 37. 


First we would need to know the Einstein field equation, but in a 
vacuum this is fairly straightforward: R,, = 0. Einstein posited this 
equation based essentially on the considerations laid out in section 
5.1. 


But just knowing that a certain tensor vanishes identically in the 
space surrounding the earth clearly doesn’t tell us anything explicit 
about the structure of the spacetime in that region. We want to 
know the metric. As suggested at the beginning of the chapter, we 
expect that the first derivatives of the metric will give a quantity 
analogous to the gravitational field of Newtonian mechanics, but this 
quantity will not be directly observable, and will not be a tensor. 
The second derivatives of the metric are the ones that we expect to 
relate to the Ricci tensor Rap. 


5.6.1 The covariant derivative in electromagnetism 


We’re talking blithely about derivatives, but it’s not obvious how 
to define a derivative in the context of general relativity in such a 
way that taking a derivative results in well-behaved tensor. 


To see how this issue arises, let’s retreat to the more familiar 
terrain of electromagnetism. In quantum mechanics, the phase of a 
charged particle’s wavefunction is unobservable, so that for example 
the transformation YV — —W does not change the results of experi- 
ments. As a less trivial example, we can redefine the ground of our 
electrical potential, ® > ® + 6®, and this will add a constant onto 
the energy of every electron in the universe, causing their phases to 
oscillate at a greater rate due to the quantum-mechanical relation 
FE = hf. There are no observable consequences, however, because 
what is observable is the phase of one electron relative to another, 
as in a double-slit interference experiment. Since every electron has 
been made to oscillate faster, the effect is simply like letting the con- 
ductor of an orchestra wave her baton more quickly; every musician 
is still in step with every other musician. The rate of change of the 
wavefunction, i.e., its derivative, has some built-in ambiguity. 


For simplicity, let’s now restrict ourselves to spin-zero parti- 
cles, since details of electrons’ polarization clearly won’t tell us 
anything useful when we make the analogy with relativity. For a 
spin-zero particle, the wavefunction is simply a complex number, 
and there are no observable consequences arising from the transfor- 
mation UV > WU’ = e’°W, where a is a constant. The transformation 
® — ® — 6@ is also allowed, and it gives a(t) = (qd®/h)t, so that 
the phase factor e*“) is a function of time t. Now from the point 
of view of electromagnetism in the age of Maxwell, with the elec- 
tric and magnetic fields imagined as playing their roles against a 
background of Euclidean space and absolute time, the form of this 


a/A_ double-slit experiment 
with electrons. If we add an 
arbitrary constant to the potential, 
no observable changes result. 
The wavelength is shortened, but 
the relative phase of the two parts 
of the waves stays the same. 


WWW 
AVA 
AVIV 


b/Two wavefunctions — with 
constant wavelengths, and a 
third with a varying wavelength. 
None of these are physically 
distinguishable, provided that the 
same variation in wavelength is 
applied to all electrons in the 
universe at any given point in 
spacetime. There is not even 
any unambiguous way to pick out 
the third one as the one with a 
varying wavelength. We could 
choose a different gauge in which 
the third wave was the only one 
with a constant wavelength. 
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time-dependent phase factor is very special and symmetrical; it de- 
pends only on the absolute time variable. But to a relativist, there is 
nothing very nice about this function at all, because there is nothing 
special about a time coordinate. If we’re going to allow a function 
of this form, then based on the coordinate-invariance of relativity, it 
seems that we should probably allow @ to be any function at all of 
the spacetime coordinates. The proper generalization of ® > 6—d® 
is now A, — Ay — Oya, where A, is the electromagnetic potential 
four-vector (section 4.2.5, page 137). 


Self-check: Suppose we said we would allow a to be a function 
of t, but forbid it to depend on the spatial coordinates. Prove that 
this would violate Lorentz invariance. 


The transformation has no effect on the electromagnetic fields, 
which are the direct observables. We can also verify that the change 
of gauge will have no effect on observable behavior of charged par- 
ticles. This is because the phase of a wavefunction can only be 
determined relative to the phase of another particle’s wavefunction, 
when they occupy the same point in space and, for example, inter- 
fere. Since the phase shift depends only on the location in spacetime, 
there is no change in the relative phase. 


But bad things will happen if we don’t make a corresponding 
adjustment to the derivatives appearing in the Schrédinger equation. 
These derivatives are essentially the momentum operators, and they 
give different results when applied to W’ than when applied to VU: 


Op —> Ob (c’?W) 
e'°O,U + idha (cw) 
(Op + A, = Ap) wv’ 


To avoid getting incorrect results, we have to do the substitution 
Op — Op + ieAp, where the correction term compensates for the 
change of gauge. We call the operator V defined as 


Vp =O, t+ ie Ay 


the covariant derivative. It gives the right answer regardless of a 
change of gauge. 


5.6.2 The covariant derivative in general relativity 


Now consider how all of this plays out in the context of gen- 
eral relativity. The gauge transformations of general relativity are 
arbitrary smooth changes of coordinates. One of the most basic 
properties we could require of a derivative operator is that it must 
give zero on a constant function. A constant scalar function remains 
constant when expressed in a new coordinate system, but the same 
is not true for a constant vector function, or for any tensor of higher 
rank. This is because the change of coordinates changes the units 


Chapter 5 Curvature 


in which the vector is measured, and if the change of coordinates is 
nonlinear, the units vary from point to point. 


Consider the one-dimensional case, in which a vector v* has only 
one component, and the metric is also a single number, so that we 
can omit the indices and simply write v and g. (We just have to 
remember that v is really a covariant vector, even though we’re 
leaving out the upper index.) If v is constant, its derivative du/ dz, 
computed in the ordinary way without any correction term, is zero. 
If we further assume that the coordinate x is a normal coordinate, so 
that the metric is simply the constant g = 1, then zero is not just the 
answer but the right answer. (The existence of a preferred, global 
set of normal coordinates is a special feature of a one-dimensional 
space, because there is no curvature in one dimension. In more than 
one dimension, there will typically be no possible set of coordinates 
in which the metric is constant, and normal coordinates only give a 
metric that is approximately constant in the neighborhood around 
a certain point. See figure g pn page 164 for an example of normal 
coordinates on a sphere, which do not have a constant metric.) 


Now suppose we transform into a new coordinate system X, 
which is not normal. The metric G, expressed in this coordinate 
system, is not constant. Applying the tensor transformation law, 
we have V = vdX/ dz, and differentiation with respect to X will 
not give zero, because the factor dX/ dz isn’t constant. This is the 
wrong answer: V isn’t really varying, it just appears to vary because 
G does. 


We want to add a correction term onto the derivative operator 
d/dX, forming a covariant derivative operator Vx that gives the 
right answer. This correction term is easy to find if we consider 
what the result ought to be when differentiating the metric itself. 
In general, if a tensor appears to vary, it could vary either because 
it really does vary or because the metric varies. If the metric itself 
varies, it could be either because the metric really does vary or 
... because the metric varies. In other words, there is no sensible 
way to assign a nonzero covariant derivative to the metric itself, so 
we must have VxG = 0. The required correction therefore consists 
of replacing d/ dX with 


Applying this to G gives zero. G is a second-rank contravariant 
tensor. If we apply the same correction to the derivatives of other 
second-rank contravariant tensors, we will get nonzero results, and 
they will be the right nonzero results. For example, the covariant 
derivative of the stress-energy tensor T (assuming such a thing could 
have some physical significance in one dimension!) will be VxT = 
dT/dX — G-1(dG/ dX)T. 


Physically, the correction term is a derivative of the metric, and 
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c/ These three rulers represent 
three choices of coordinates. As 
in figure b on page 173, switching 
from one set of coordinates to 
another has no effect on any 
experimental observables. It is 
merely a choice of gauge. 
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we’ve already seen that the derivatives of the metric (1) are the clos- 
est thing we get in general relativity to the gravitational field, and 
(2) are not tensors. In 1+1 dimensions, suppose we observe that a 
free-falling rock has dV/ dT = 9.8 m/s?. This acceleration cannot be 
a tensor, because we could make it vanish by changing from Earth- 
fixed coordinates X to free-falling (normal, locally Lorentzian) co- 
ordinates x, and a tensor cannot be made to vanish by a change of 
coordinates. According to a free-falling observer, the vector v isn’t 
changing at all; it is only the variation in the Earth-fixed observer’s 
metric G that makes it appear to change. 


Mathematically, the form of the derivative is (1/y) dy/ daz, which 
is known as a logarithmic derivative, since it equals dn y)/da. It 
measures the multiplicative rate of change of y. For example, if 
y scales up by a factor of k when x increases by 1 unit, then the 
logarithmic derivative of y is Ink. The logarithmic derivative of 
e™ is c. The logarithmic nature of the correction term to Vx is a 
good thing, because it lets us take changes of scale, which are mul- 
tiplicative changes, and convert them to additive corrections to the 
derivative operator. The additivity of the corrections is necessary if 
the result of a covariant derivative is to be a tensor, since tensors 
are additive creatures. 


What about quantities that are not second-rank covariant ten- 
sors? Under a rescaling of contravariant coordinates by a factor of 
k, covariant vectors scale by k~!, and second-rank covariant tensors 
by k~?. The correction term should therefore be half as much for 
covariant vectors, 


and should have an opposite sign for contravariant vectors. 


Generalizing the correction term to derivatives of vectors in more 
than one dimension, we should have something of this form: 


b bb 
Vav = Oqv’ +T.0° 
Valo = Oave — T aves 


where T°, called the Christoffel symbol, does not transform like 
a tensor, and involves derivatives of the metric. (“Christoffel” is 
pronounced “Krist-AWful,” with the accent on the middle syllable.) 
The explicit computation of the Christoffel symbols from the metric 
is deferred until section 5.9, but the intervening sections 5.7 and 5.8 
can be omitted on a first reading without loss of continuity. 


An important gotcha is that when we evaluate a particular com- 
ponent of a covariant derivative such as V2v?, it is possible for the 
result to be nonzero even if the component v® vanishes identically. 
This can be seen in example 5 on p. 305 and example 21 on p. 345. 
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Christoffel symbols on the globe Example: 9 
As a qualitative example, consider the geodesic airplane trajec- 
tory shown in figure d, from London to Mexico City. In physics 
it is customary to work with the colatitude, 8, measured down 
from the north pole, rather then the latitude, measured from the 
equator. At P, over the North Atlantic, the plane’s colatitude has 
a minimum. (We can see, without having to take it on faith from 
the figure, that such a minimum must occur. The easiest way to 
convince oneself of this is to consider a path that goes directly 
over the pole, at 6 = 0.) 


At P, the plane’s velocity vector points directly west. At Q, over 
New England, its velocity has a large component to the south. 
Since the path is a geodesic and the plane has constant speed, 
the velocity vector is simply being parallel-transported; the vec- 
tor’s covariant derivative is zero. Since we have vg = 0 at P, the 
only way to explain the nonzero and positive value of 04° is that 
we have a nonzero and negative value of T™,,- 


By symmetry, we can infer that aes must have a positive value 
in the southern hemisphere, and must vanish at the equator. 


aes is computed in example 11 on page 189. 


Symmetry also requires that this Christoffel symbol be indepen- 
dent of «, and it must also be independent of the radius of the 
sphere. 


Example 9 is in two spatial dimensions. In spacetime, IT is es- 
sentially the gravitational field (see problem 7, p. 209), and early 
papers in relativity essentially refer to it that way.? This may feel 
like a joyous reunion with our old friend from freshman mechanics, 
g = 9.8 m/s. But our old friend has changed. In Newtonian me- 
chanics, accelerations like g are frame-invariant (considering only 
inertial frames, which are the only legitimate ones in that theory). 
In general relativity they are frame-dependent, and as we saw on 
page 176, the acceleration of gravity can be made to equal anything 
we like, based on our choice of a frame of reference. 


Not a tensor Example: 10 
Here are a couple of intuitive explanations of why the Christoffel 
symbol cannot be a tensor. Both of them employ the fact that if 
a tensor is zero in one set of coordinates, it is zero in others as 
well. 


In general relativity, [ is essentially the gravitational field. But we 
can always find a free-falling frame of reference, corresponding 
locally to some coordinate system, in which the gravitational field 
is zero. Therefore if [ were a tensor, it would have to vanish 


°“On the gravitational field of a point mass according to Einstein’s the- 
ory,” Sitzungsberichte der K6niglich Preussischen Akademie der Wissenschaften 
1 (1916) 189, translated in arxiv.org/abs/physics/9905030v1. 
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e/ Birdtracks notation 
covariant derivative. 


Oe 
@- 


for 


the 


everywhere. 


For intuition in a broader mathematical context, consider the trans- 
formation in the Euclidean plane from Cartesian coordinates to 
polar coordinates. The Christoffel symbol is zero in Cartesian 
coordinates, but nonzero in polar coordinates (problem 2, page 
209). This would be impossible if T transformed as a tensor. 


By direct calculation, it is possible to show that when we trans- 
form from coordinates (a, b, ...) to new coordinates (x, y,...), the 
change in T consists of the sum of two terms. The first term is 
the change that we would expect for a tensor with one upper and 
two lower indices, as suggested by the notation. The second, 
nontensorial term modifies a component such as [4 by 


da 0x 

Ox ObIC’ 
The second derivative is a measure of “acceleration,” or, more 
generally, the rate at which the unit vectors change as we move 
from point to point. For example, in changing from a Newtonian 
inertial frame to a noninertial one, x’ = x + (1/2) at?, we would 
have a nonzero second derivative 02x’ /dt?. 


We have started by discussing the covariant derivative of an 
upper-index vector. To compute the covariant derivative of a higher- 
rank tensor, we just add more correction terms, e.g., 


Vilia= 0, Uip = Vie a pa 
or 


Vie = 0.Ur= TS 06S Fs Ue. 


With the partial derivative 0,,, it does not make sense to use the 
metric to raise the index and form 0. It does make sense to do so 
with covariant derivatives, so V* = g%°V, is a correct identity. 


Comma, semicolon, and birdtracks notation 


Some authors use superscripts with commas and semicolons to 
indicate partial and covariant derivatives. The following equations 
give equivalent notations for the same derivatives: 


On = Aare 
VaXb = Xba 
VX, = X;,* 


Figure e shows two examples of the corresponding birdtracks no- 
tation. Because birdtracks are meant to be manifestly coordinate- 
independent, they do not have a way of expressing non-covariant 
derivatives. We no longer want to use the circle as a notation for 
a non-covariant gradient as we did when we first introduced it on 
p. 48. 
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5.7 The geodesic equation 


In this section, which can be skipped at a first reading, we show how 
the Christoffel symbols can be used to find differential equations that 
describe geodesics. 


5.7.1 Characterization of the geodesic 


A geodesic can be defined as a world-line that preserves tangency 
under parallel transport, a. This is essentially a mathematical way 
of expressing the notion that we have previously expressed more 
informally in terms of “staying on course” or moving “inertially.” 


A curve can be specified by giving functions x(X) for its coor- 
dinates, where 4 is a real parameter. A vector lying tangent to the 
curve can then be calculated using partial derivatives, T4 = Ox /OX. 
There are three ways in which a vector function of X could change: 
(1) it could change for the trivial reason that the metric is changing, 
so that its components changed when expressed in the new metric; 
(2) it could change its components perpendicular to the curve; or 
(3) it could change its component parallel to the curve. Possibility 
1 should not really be considered a change at all, and the definition 
of the covariant derivative is specifically designed to be insensitive 
to this kind of thing. 2 cannot apply to T”, which is tangent by 
construction. It would therefore be convenient if T“ happened to 
be always the same length. If so, then 3 would not happen either, 
and we could reexpress the definition of a geodesic by saying that 
the covariant derivative of T” was zero. For this reason, we will 
assume for the remainder of this section that the parametrization 
of the curve has this property. In a Newtonian context, we could 
imagine the x" to be purely spatial coordinates, and 4 to be a uni- 
versal time coordinate. We would then interpret J as the velocity, 
and the restriction would be to a parametrization describing motion 
with constant speed. In relativity, the restriction is that 4 must be 
an affine parameter. For example, it could be the proper time of a 
particle, if the curve in question is timelike. 


5.7.2 Covariant derivative with respect to a parameter 


The notation of section 5.6 is not quite adapted to our present 
purposes, since it allows us to express a covariant derivative with 
respect to one of the coordinates, but not with respect to a param- 
eter such as 4. We would like to notate the covariant derivative of 
T” with respect to \ as V) 7“, even though J isn’t a coordinate. To 
connect the two types of derivatives, we can use a total derivative. 
To make the idea clear, here is how we calculate a total derivative 
for a scalar function f(x,y), without tensor notation: 


df Of 0x | Of Oy 

d\ OxO\ °° OyOXN 
This is just the generalization of the chain rule to a function of two 
variables. For example, if A represents time and f temperature, 


a/The geodesic, 1, preserves 
tangency under parallel trans- 
port. The non-geodesic curve, 
2, doesn’t have this property; 
a vector initially tangent to the 
curve is no longer tangent to it 
when parallel-transported along 
it. 
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then this would tell us the rate of change of the temperature as 
a thermometer was carried through space. Applying this to the 
present problem, we express the total covariant derivative as 


dx” 
Tea(V es —— 
Vd (V ID 
dx” 
= (0,T" +I_,T” : 
(QT! + DM TY) 


5.7.3. The geodesic equation 


Recognizing 0,7" dxz*/dX as a total non-covariant derivative, 
we find Sank ae 
x 
VT? = — +1". T’ _. 
A a a ay 
Substituting Ox"/OX for T", and setting the covariant derivative 
equal to zero, we obtain 


fd x! ‘i dx” dx” = 


az ay dav 
This is known as the geodesic equation. There is a factor of two that 
is acommon gotcha when applying this equation. The symmetry of 
the Christoffel symbols I, =I“., implies that when « and v are 
distinct, the same term will appear twice in the summation. 


If this differential equation is satisfied for one affine parameter 
A, then it is also satisfied for any other affine parameter \’ = a+, 
where a and 0 are constants (problem 5). Recall that affine param- 
eters are only defined along geodesics, not along arbitrary curves. 
We can’t start by defining an affine parameter and then use it to 
find geodesics using this equation, because we can’t define an affine 
parameter without first specifying a geodesic. Likewise, we can’t 
do the geodesic first and then the affine parameter, because if we 
already had a geodesic in hand, we wouldn’t need the differential 
equation in order to find a geodesic. The solution to this chicken- 
and-egg conundrum is to write down the differential equations and 
try to find a solution, without trying to specify either the affine pa- 
rameter or the geodesic in advance. We will seldom have occasion 
to resort to this technique, an exception being example 19 on page 
344. 


5.7.4 Uniqueness 


The geodesic equation is useful in establishing one of the neces- 
sary theoretical foundations of relativity, which is the uniqueness of 
geodesics for a given set of initial conditions. This is related to ax- 
iom O1 of ordered geometry, that two points determine a line, and 
is necessary physically for the reasons discussed on page 22; briefly, 
if the geodesic were not uniquely determined, then particles would 
have no way of deciding how to move. The form of the geodesic 
equation guarantees uniqueness. To see this, consider the following 
algorithm for determining a numerical approximation to a geodesic: 
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1. Initialize 4, the x and their derivatives dx“/dX. Also, set a 
small step-size AX by which to increment A at each step below. 


2. For each i, calculate & x! / dd? using the geodesic equation. 
3. Add (# a#/d\?)AX to the currently stored value of dx!/ dX. 
4. Add (da /dA)AX to xt. 

5. Add AX to X. 


6. Repeat steps 2-5 until the geodesic has been extended to the 
desired affine distance. 


Since the result of the calculation depends only on the inputs at 
step 1, we find that the geodesic is uniquely determined. 


To see that this is really a valid way of proving uniqueness, it 
may be helpful to consider how the proof could have failed. Omitting 
some of the details of the tensors and the multidimensionality of the 
space, the form of the geodesic equation is essentially ¢ + fz? = 0, 
where dots indicate derivatives with respect to A. Suppose that it 
had instead had the form #? + f# = 0. Then at step 2 we would 
have had to pick either a positive or a negative square root for 7. 
Although continuity would usually suffice to maintain a consistent 
sign from one iteration to the next, that would not work if we ever 
came to a point where # vanished momentarily. An equation of this 
form therefore would not have a unique solution for a given set of 
initial conditions. 


The practical use of this algorithm to compute geodesics numer- 
ically is demonstrated in section 5.9.2 on page 189. 


5.8 Torsion 


This section describes the concept of gravitational torsion. It can 
be skipped without loss of continuity, provided that you accept the 
symmetry property Pog = 0 without worrying about what it means 
physically or what empirical evidence supports it. 


Self-check: Interpret the mathematical meaning of the equation 
Pog = 0, which is expressed in the notation introduced on page 
103. 


5.8.1 Are scalars path-dependent? 


It seems clear that something like the covariant derivative is 
needed for vectors, since they have a direction in spacetime, and 
thus their measures vary when the measure of spacetime itself varies. 
Since scalars don’t have a direction in spacetime, the same reasoning 
doesn’t apply to them, and this is reflected in our rules for covariant 
derivatives. The covariant derivative has one I term for every index 
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a / Measuring 
a scalar T. 
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0?T /Oxdy 


for 


of the tensor being differentiated, so for a scalar there should be no 
T terms at all, i.e., Va is the same as Og. 


But just because derivatives of scalars don’t require special treat- 
ment for this particular reason, that doesn’t mean they are guaran- 
teed to behave as we intuitively expect, in the strange world of 
coordinate-invariant relativity. 


One possible way for scalars to behave counterintuitively would 
be by analogy with parallel transport of vectors. If we stick a vector 
in a box (as with, e.g., the gyroscopes aboard Gravity Probe B) and 
carry it around a closed loop, it changes. Could the same happen 
with a scalar? This is extremely counterintuitive, since there is no 
reason to imagine such an effect in any of the models we’ve con- 
structed of curved spaces. In fact, it is not just counterintuitive but 
mathematically impossible, according to the following argument. 
The only reason we can interpret the vector-in-a-box effect as aris- 
ing from the geometry of spacetime is that it applies equally to all 
vectors. If, for example, it only applied to the magnetic polariza- 
tion vectors of ferromagnetic substances, then we would interpret 
it as a magnetic field living in spacetime, not a property of space- 
time itself. If the value of a scalar-in-a-box was path-dependent, 
and this path-dependence was a geometric property of spacetime, 
then it would have to apply to all scalars, including, say, masses 
and charges of particles. Thus if an electron’s mass increased by 1% 
when transported in a box along a certain path, its charge would 
have to increase by 1% as well. But then its charge-to-mass ra- 
tio would remain invariant, and this is a contradiction, since the 
charge-to-mass ratio is also a scalar, and should have felt the same 
1% effect. Since the varying scalar-in-a-box idea leads to a contra- 
diction, it wasn’t a coincidence that we couldn’t find a model that 
produced such an effect; a theory that lacks self-consistency doesn’t 
have any models. 


Self-check: Explain why parallel transporting a vector can only 
rotate it, not change its magnitude. 


There is, however, a different way in which scalars could behave 
counterintuitively, and this one is mathematically self-consistent. 
Suppose that Helen lives in two spatial dimensions and owns a ther- 
mometer. She wants to measure the spatial variation of tempera- 
ture, in particular its mixed second derivative 0?7T/Oxr0y. At home 
in the morning at point A, she prepares by calibrating her gyro- 
compass to point north and measuring the temperature. Then she 
travels € = 1 km east along a geodesic to B, consults her gyro- 
compass, and turns north. She continues one kilometer north to C, 
samples the change in temperature AT) relative to her home, and 
then retraces her steps to come home for lunch. In the afternoon, 
she checks her work by carrying out the same process, but this time 
she interchanges the roles of north and east, traveling along ADE. 
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If she were living in a flat space, this would form the other two sides 
of a square, and her afternoon temperature sample AT would be 
at the same point in space C as her morning sample. She actually 
doesn’t recognize the landscape, so the sample points C and E are 
different, but this just confirms what she already knew: the space 
isn’t flat.1° 


None of this seems surprising yet, but there are now two quali- 
tatively different ways that her analysis of her data could turn out, 
indicating qualitatively different things about the laws of physics 
in her universe. The definition of the derivative as a limit requires 
that she repeat the experiment at smaller scales. As ¢ — 0, the 
result, for 0?7'/Oxdy should approach a definite limit, and the er- 
ror should diminish in proportion to @. In particular the difference 
between the results inferred from AT and AT> indicate an error, 
and the discrepancy between the second derivatives inferred from 
them should shrink appropriately as @ shrinks. Suppose this doesn’t 
happen. Since partial derivatives commute, we conclude that her 
measuring procedure is not the same as a partial derivative. Let’s 
call her measuring procedure V, so that she is observing a discrep- 
ancy between V,V, and V,Vz. The fact that the commutator 
VeVy — VyVa« doesn’t vanish cannot be explained by the Christof- 
fel symbols, because what she’s differentiating is a scalar. Since the 
discrepancy arises entirely from the failure of AT; — AT> to scale 
down appropriately, the conclusion is that the distance 6 between 
the two sampling points is not scaling down as quickly as we ex- 
pect. In our familiar models of two-dimensional spaces as surfaces 
embedded in three-space, we always have 6 ~ ¢° for small £, but she 
has found that it only shrinks as quickly as @. 


For a clue as to what is going on, note that the commutator 
ViVy — VyV~ has a particular handedness to it. For example, 
it flips its sign under a reflection across the line y = x. When we 
“parallel”-transport vectors, they aren’t actually staying parallel. In 
this hypothetical universe, a vector in a box transported by a small 
distance £ rotates by an angle proportional to ¢. This effect is called 
torsion. Although no torsion effect shows up in our familiar models, 
that is not because torsion lacks self-consistency. Models of spaces 
with torsion do exist. In particular, we can see that torsion doesn’t 
lead to the same kind of logical contradiction as the varying-scalar- 
in-a-box idea. Since all vectors twist by the same amount when 
transported, inner products are preserved, so it is not possible to 
put two vectors in one box and get the scalar-in-a-box paradox by 
watching their inner product change when the box is transported. 


Note that the elbows ABC and ADE are not right angles. If 
Helen had brought a pair of gyrocompasses with her, one for x and 


This point was mentioned on page 168, in connection with the definition of 
the Riemann tensor. 


b/The gyroscopes both ro- 
tate when transported from A 
to B, causing Helen to navigate 
along BC, which does not form 
a right angle with AB. The angle 
between the two gyroscopes’ 
axes is always the same, so the 
rotation is not locally observable, 
but it does produce an observable 
gap between C and E. 
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one for y, she would have found that the right angle between the 
gyrocompasses was preserved under parallel transport, but that a 
gyrocompass initially tangent to a geodesic did not remain so. There 
are in fact two inequivalent definitions of a geodesic in a space with 
torsion. The shortest path between two points is not necessarily 
the same as the straightest possible path, i.e., the one that parallel- 
transports its own tangent vector. 


5.8.2 The torsion tensor 


Since torsion is odd under parity, it must be represented by an 
odd-rank tensor, which we call 7°, and define according to 


es Coie ees 


ge where f is any scalar field, such as the temperature in the preced- 
ing section. There are two different ways in which a space can be 
non-Euclidean: it can have curvature, or it can have torsion. For 


c/ Three gyroscopes are_ ini- a full discussion of how to handle the mathematics of a spacetime 
tially aligned with the x, y, and with both curvature and torsion, see the article by Steuard Jensen at 
Zz axes. After parallel transport http: //www.slimy.com/~steuard/teaching/tutorials/GRtorsion. 


along the geodesic x axis, the 
X gyro is still aligned with the x 
axis, but the y and z gyros have 
rotated. 


pdf. For our present purposes, the main mathematical fact worth 
noting is that vanishing torsion is equivalent to the symmetry I%,. = 
I, of the Christoffel symbols. Using the notation introduced on 
page 103, Pel =Oif7=0. 


Self-check: Use an argument similar to the one in example 5 
on page 166 to prove that no model of a two-space embedded in a 
three-space can have torsion. 


Generalizing to more dimensions, the torsion tensor is odd under 
the full spacetime reflection 7g  —2q, i.e., a parity inversion plus 
a time-reversal, PT. 


In the story above, we had a torsion that didn’t preserve tan- 
gent vectors. In three or more dimensions, however, it is possible 
to have torsion that does preserve tangent vectors. For example, 
transporting a vector along the x axis could cause only a rotation in 
the y-z plane. This relates to the symmetries of the torsion tensor, 
which for convenience we'll write in an x-y-z coordinate system and 
in the fully covariant form 7,,,. The definition of the torsion tensor 
implies 7(,,,) = 0, i.e., that the torsion tensor is antisymmetric in 
its two final indices. Torsion that does not preserve tangent vectors 
will have nonvanishing elements such as Tzz,, meaning that parallel- 
transporting a vector along the x axis can change its x component. 
Torsion that preserves tangent vectors will have vanishing 7),,,, un- 
less A, 4, and vy are all distinct. This is an example of the type 
of antisymmetry that is familiar from the vector cross product, in 
which the cross products of the basis vectors behave as x x y = Z, 
yXZ=x,yxXz=x. Generalizing the notation for symmetrization 
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and antisymmetrization of tensors from page 103, we have 
1 
3! 


1 
Tae] = yee Tate: 


T abc) = 52 Tate 


where the sums are over all permutations of the indices, and in the 
second line we have used the Levi-Civita symbol. In this notation, 
a totally antisymmetric torsion tensor is one with 7yy=7[,,,], and 
torsion of this type preserves tangent vectors under translation. 


In two dimensions, there are no totally antisymmetric objects 
with three indices, because we can’t write three indices without 
repeating one. In three dimensions, an antisymmetric object with 
three indices is simply a multiple of the Levi-Civita tensor, so a 
totally antisymmetric torsion, if it exists, is represented by a single 
number; under translation, vectors rotate like either right-handed 
or left-handed screws, and this number tells us the rate of rotation. 
In four dimensions, we have four independently variable quantities, 
Tryzs Ttyz» Ttez, and Tey. In other words, an antisymmetric torsion of 


3+1 spacetime can be represented by a four-vector, T¢ = €“ Ty¢4. 


5.8.3 Experimental searches for torsion 


One way of stating the equivalence principle (see p. 142) is that 
it forbids spacetime from coming equipped with a vector field that 
could be measured by free-falling observers, i.e., observers in local 
Lorentz frames. A variety of high-precision tests of the equivalence 
principle have been carried out. From the point of view of an ex- 
perimenter doing this kind of test, it is important to distinguish 
between fields that are “built in” to spacetime and those that live 
in spacetime. For example, the existence of the earth’s magnetic 
field does not violate the equivalence principle, but if an experi- 
ment was sensitive to the earth’s field, and the experimenter didn’t 
know about it, there would appear to be a violation. Antisymmet- 
ric torsion in four dimensions acts like a vector. If it constitutes 
a universal background effect built into spacetime, then it violates 
the equivalence principle. If it instead arises from specific material 
sources, then it may still show up as a measurable effect in exper- 
imental tests designed to detect Lorentz-invariance. Let’s consider 
the latter possibility. 


Since curvature in general relativity comes from mass and en- 
ergy, as represented by the stress-energy tensor T,,, we could ask 
what would be the sources of torsion, if it exists in our universe. 
The source can’t be the rank-2 stress-energy tensor. It would have 
to be an odd-rank tensor, i.e., a quantity that is odd under PT, and 
in theories that include torsion it is commonly assumed that the 
source is the quantum-mechanical angular momentum of subatomic 
particles. If this is the case, then torsion effects are expected to be 
proportional to hG, the product of Planck’s constant and the gravi- 
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d/The University of Wash- 
ington torsion pendulum used to 
search for torsion. The light gray 
wedges are Alnico, the darker 
ones SmCos. The arrows with 
the filled heads represent the 
directions of the electron spins, 
with denser arrows indicating 
higher polarization. The arrows 
with the open heads show the 
direction of the B field. 


tational constant, and they should therefore be extremely small and 
hard to measure. String theory, for example, includes torsion, but 
nobody has found a way to test string theory empirically because it 
essentially makes predictions about phenomena at the Planck scale, 
VhG/c3 ~ 10-°° m, where both gravity and quantum mechanics 
are strong effects. 


There are, however, some high-precision experiments that have 
a reasonable chance of detecting whether our universe has torsion. 
Torsion violates the equivalence principle, and by the turn of the 
century tests of the equivalence principle had reached a level of 
precision sufficient to rule out some models that include torsion. 
Figure d shows a torsion pendulum used in an experiment by the 
E6t-Wash group at the University of Washington.!! If torsion exists, 
then the intrinsic spin o of an electron should have an energy 0 -T, 
where 7 is the spacelike part of the torsion vector. The torsion 
could be generated by the earth, the sun, or some other object at a 
greater distance. The interaction 0-7 will modify the behavior of a 
torsion pendulum if the spins of the electrons in the pendulum are 
polarized nonrandomly, as in a magnetic material. The pendulum 
will tend to precess around the axis defined by rT. 


This type of experiment is extremely difficult, because the pen- 
dulum tends to act as an ultra-sensitive magnetic compass, resulting 
in a measurement of the ambient magnetic field rather than the hy- 
pothetical torsion field 7. To eliminate this source of systematic 
error, the UW group first eliminated the ambient magnetic field 
as well as possible, using mu-metal shielding and Helmholtz coils. 
They also constructed the pendulum out of a combination of two 
magnetic materials, Alnico 5 and SmCos, in such a way that the 
magnetic dipole moment vanished, but the spin dipole moment did 
not; Alnico 5’s magnetic field is due almost entirely to electron spin, 
whereas the magnetic field of SmCos contains significant contribu- 
tions from orbital motion. The result was a nonmagnetic object 
whose spins were polarized. After four years of data collection, they 
found |r| < 107?! eV. Models that include torsion typically predict 
such effects to be of the order of m2/mp ~ 107!" eV, where m¢ is 
the mass of the electron and mp = \/hc/G = 10! GeV & 20 pg is 
the Planck mass. A wide class of these models is therefore ruled out 
by these experiments. 


Since there appears to be no experimental evidence for the exis- 
tence of gravitational torsion in our universe, we will assume from 
now on that it vanishes identically. Einstein made the same as- 
sumption when he originally created general relativity, although he 
and Cartan later tinkered with non-torsion-free theories in a failed 
attempt to unify gravity with electromagnetism. Some models that 
include torsion remain viable. For example, it has been argued that 


“nttp://arxiv.org/abs/hep-ph/0606218 
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the torsion tensor should fall off quickly with distance from the 


source. !? 


Carroll and Field, http: //arxiv.org/abs/gr-qc/9403058 


Section 5.8 Torsion 187 


188 


5.9 From metric to curvature 


5.9.1 Finding the Christoffel symbol from the metric 


We’ve already found the Christoffel symbol in terms of the metric 
in one dimension. Expressing it in tensor notation, we have 


1S, = sa (Org22) , 
where inversion of the one-component matrix G has been replaced 
by matrix inversion, and, more importantly, the question marks indi- 
cate that there would be more than one way to place the subscripts 
so that the result would be a grammatical tensor equation. The 
most general form for the Christoffel symbol would be 


1 
ne = so (LOcGab + M029cb ae NOvGca) 5 


where L, M, and N are constants. Consistency with the one- 
dimensional expression requires L + M+ N = 1, and vanishing 
torsion gives L = M. The EL and M terms have a different physical 
significance than the N term. 


Suppose an observer uses coordinates such that all objects are 
described as lengthening over time, and the change of scale accu- 
mulated over one day is a factor of k > 1. This is described by the 
derivative O:gr2 <1, which affects the M term. Since the metric is 
used to calculate squared distances, the grz matrix element scales 
down by 1/Vk. To compensate for 0,v% < 0, so we need to add a 
positive correction term, M > 0, to the covariant derivative. When 
the same observer measures the rate of change of a vector v’ with 
respect to space, the rate of change comes out to be too small, be- 
cause the variable she differentiates with respect to is too big. This 
requires N < 0, and the correction is of the same size as the M 
correction, so |M|=|N|. We find L= M=—-N=1. 


Self-check: Does the above argument depend on the use of space 
for one coordinate and time for the other? 


The resulting general expression for the Christoffel symbol in 
terms of the metric is 


My = sa (Oa9va + OpGad — OaGab) - 
One can readily go back and check that this gives Vegap = 0. In fact, 
the calculation is a bit tedious. For that matter, tensor calculations 
in general can be infamously time-consuming and error-prone. Any 
reasonable person living in the 21st century will therefore resort to 
a computer algebra system. The most widely used computer alge- 
bra system is Mathematica, but it’s expensive and proprietary, and 
it doesn’t have extensive built-in facilities for handling tensors. It 
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turns out that there is quite a bit of free and open-source tensor soft- 
ware, and it falls into two classes: coordinate-based and coordinate- 
independent. The best open-source coordinate-independent facil- 
ity available appears to be Cadabra, and in fact the verification of 
V Jab = 0 is the first example given in the Leo Brewin’s handy guide 
to applications of Cadabra to general relativity.!° 


Self-check: In the case of 1 dimension, show that this reduces to 
the earlier result of —(1/2)dG/dX. 


Since I’ is not a tensor, it is not obvious that the covariant deriva- 
tive, which is constructed from it, is a tensor. But if it isn’t obvious, 
neither is it surprising — the goal of the above derivation was to get 
results that would be coordinate-independent. 


Christoffel symbols on the globe, quantitatively Example: 11 
In example 9 on page 177, we inferred the following properties 
for the Christoffel symbol ages on a sphere of radius FR: aes is 
independent of @ and R, ae < 0 in the northern hemisphere 
(colatitude 0 less than 7/2), T°,,4 = 0 on the equator, andl®,, > 
0 in the southern hemisphere. 


The metric on a sphere is ds? = R?d0? + R? sin? 6 de?. The only 
nonvanishing term in the expression for eres is the one involving 
de9pp = 2R* sin 8 cos 0. The result is T°, = — sin @cos 0, which 
can be verified to have the properties claimed above. 


5.9.2 Numerical solution of the geodesic equation 


On page 180 I gave an algorithm that demonstrated the unique- 
ness of the solutions to the geodesic equation. This algorithm can 
also be used to find geodesics in cases where the metric is known. 
The following program, written in the computer language Python, 
carries out a very simple calculation of this kind, in a case where 
we know what the answer should be; even without any previous 
familiarity with Python, it shouldn’t be difficult to see the corre- 
spondence between the abstract algorithm presented on page 180 
and its concrete realization below. For polar coordinates in a Eu- 
clidean plane, one can compute [",,, = —r and Ras ee /r (problem 
2, page 209). Here we compute the geodesic that starts out tangent 
to the unit circle at 6 = 0. 


1 import math 

2 

3 1=0 # affine parameter lambda 

4 dl= .001 # change in 1 with each iteration 
5 l_max = 100. 

6 

7 


# initial position: 


Shttp://arxiv.org/abs/0903.2085 
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8 r=1 


9 phi=0 

10 # initial derivatives of coordinates w.r.t. lambda 
11 vr = 0 

12. vphi = 1 

13 


14 k = 0 # keep track of how often to print out updates 
15 while 1<l_max: 


16 1 = 1l+dl 

17 # Christoffel symbols: 

18 Grphiphi = -r 

19 Gphirphi = 1/r 

20 # second derivatives: 

21 ar = -Grphiphi*vphi*vphi 

22 aphi = -2.*Gphirphi*vr*vphi 

23 # ... factor of 2 because G*a_{bc}=G"a_{cb} and b 
24 # is not the same as c 

25 # update velocity: 

26 vr = vr + dlx*ar 

27 vphi = vphi + dl*aphi 

28 # update position: 

29 r=r + vredl 

30 phi = phi + vphi*dl 

31 if k%10000==0: # k is divisible by 10000 

32 phi_deg = phi*180./math.pi 

33 print "lambda=%6.2f r=%6.2f  phi=/6.2f deg." % (1,r,phi_deg) 
34 k = kt1 


It is not necessary to worry about all the technical details of the 
language (e.g., line 1, which makes available such conveniences as 
math.pi for 7). Comments are set off by pound signs. Lines 16-34 
are indented because they are all to be executed repeatedly, until it 
is no longer true that A < Amazx (line 15). 


Self-check: By inspecting lines 18-22, find the signs of # and ¢ 
at A = 0. Convince yourself that these signs are what we expect 
geometrically. 


The output is as follows: 


lambda= 0.00 r= 1.00 phi= 0.06 deg. 
lambda= 10.00 r= 10.06 phi= 84.23 deg. 
lambda= 20.00 r= 20.04 phi= 87.07 deg. 
lambda= 30.00 r= 30.04 phi= 88.02 deg. 
lambda= 40.00 r= 40.04 phi= 88.50 deg. 
lambda= 50.00 r= 50.04 £4phi= 88.78 deg. 
lambda= 60.00 r= 60.05 phi= 88.98 deg. 
lambda= 70.00 r= 70.05 phi= 89.11 deg. 
lambda= 80.00 r= 80.06 phi= 89.21 deg. 


COON OTK WN 
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10 lambda= 90.00 r= 90.06 phi= 89.29 deg. 


We can see that ¢ > 90 deg. as X — oo, which makes sense, 
because the geodesic is a straight line parallel to the y axis. 


A less trivial use of the technique is demonstrated on page 233, 
where we calculate the deflection of light rays in a gravitational field, 
one of the classic observational tests of general relativity. 


5.9.3 The Riemann tensor in terms of the Christoffel symbols 


The covariant derivative of a vector can be interpreted as the rate 
of change of a vector in a certain direction, relative to the result of 
parallel-transporting the original vector in the same direction. We 
can therefore see that the definition of the Riemann curvature tensor 
on page 168 is a measure of the failure of covariant derivatives to 
commute: 


(VaVi= ViVal et? =A°R 5 


A tedious calculation now gives R in terms of the Is: 


R yea = Ol gy — Oak ey + ceP ay — Tel “ce 


This is given as another example later in Brewin’s manual for apply- 
ing Cadabra to general relativity.‘ (Brewin writes the upper index 
in the second slot of R.) 


5.9.4 Some general ideas about gauge 


Let’s step back now for a moment and try to gain some physi- 
cal insight by looking at the features that the electromagnetic and 
relativistic gauge transformations have in common. We have the 
following analogies: 


“http://arxiv.org/abs/0903.2085 
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a/The Aharonov-Bohm _ ef- 
fect. An electron enters a beam 
splitter at P, and is sent out in 
two different directions. The two 
parts of the wave are reflected so 
that they reunite at Q. The arrows 
represent the vector potential A. 
The observable magnetic field 
B is zero everywhere outside 
the solenoid, and yet the inter- 
ference observed at Q depends 
on whether the field is turned 
on. See page 137 for further 
discussion of the A and B fields 
of a solenoid. 


differential 
geometry 


electromagnetism 


A constant phase Adding a_ con- 
shift a has no ob- stant onto a 
servable effects. coordinate has no 

observable effects. 


global symmetry 


A phase shift a@ An arbitrary co- 
that varies from ordinate transfor- 
point to point has mation has no ob- 
no observable ef- servable effects. 


local symmetry 


fects. 

The gauge is de- 

scribed by ... Qa pv 
...and  differentia- 

tion of this gives the 

gauge field... Ap up 
A second differen- 

tiation gives the 

directly observable 

field(s) ... E and B Raab 


The interesting thing here is that the directly observable fields 
do not carry all of the necessary information, but the gauge fields are 
not directly observable. In electromagnetism, we can see this from 
the Aharonov-Bohm effect, shown in figure a.!° The solenoid has 
B = 0 externally, and the electron beams only ever move through 
the external region, so they never experience any magnetic field. Ex- 
periments show, however, that turning the solenoid on and off does 
change the interference between the two beams. This is because the 
vector potential does not vanish outside the solenoid, and as we’ve 
seen on page 137, the phase of the beams varies according to the 
path integral of the Ay. We are therefore left with an uncomfort- 
able, but unavoidable, situation. The concept of a field is supposed 
to eliminate the need for instantaneous action at a distance, which 
is forbidden by relativity; that is, (1) we want our fields to have only 
local effects. On the other hand, (2) we would like our fields to be 
directly observable quantities. We cannot have both 1 and 2. The 
gauge field satisfies 1 but not 2, and the electromagnetic fields give 
2 but not 1. 


15We describe the effect here in terms of an idealized, impractical experiment. 
For the actual empirical status of the Aharonov-Bohm effect, see Batelaan and 
Tonomura, Physics Today 62 (2009) 38. 
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Figure b shows an analog of the Aharonov-Bohm experiment in 
differential geometry. Everywhere but at the tip, the cone has zero 
curvature, as we can see by cutting it and laying it out flat. But even 
an observer who never visits the tightly curved region at the tip can 
detect its existence, because parallel-transporting a vector around 
a closed loop can change the vector’s direction, provided that the 
loop surrounds the tip. 


In the electromagnetic example, integrating A around a closed 
loop reveals, via Stokes’ theorem, the existence of a magnetic flux 
through the loop, even though the magnetic field is zero at every 
location where A has to be sampled. In the relativistic example, 
integrating [ around a closed loop shows that there is curvature 
inside the loop, even though the curvature is zero at all the places 
where I has to be sampled. 


The fact that [ is a gauge field, and therefore not locally ob- 
servable, is simply a fancy way of expressing the ideas introduced 
on pp. 176 and 177, that due to the equivalence principle, the gravi- 
tational field in general relativity is not locally observable. This non- 
observability is local because the equivalence principle is a statement 
about local Lorentz frames. The example in figure b is non-local. 


Geodetic effect and structure of the source Example: 12 
> In section 5.5.1 on page 170, we estimated the geodetic effect 
on Gravity Probe B and found a result that was only off by a factor 
of 32. The mathematically pure form of the 37 suggests that the 
geodetic effect is insensitive to the distribution of mass inside the 
earth. Why should this be so? 


> The change in a vector upon parallel transporting it around a 
closed loop can be expressed in terms of either (1) the area inte- 
gral of the curvature within the loop or (2) the line integral of the 
Christoffel symbol (essentially the gravitational field) on the loop 
itself. Although | expressed the estimate as 1, it would have been 
equally valid to use 2. By Newton's shell theorem, the gravita- 
tional field is not sensitive to anything about its mass distribution 
other than its near spherical symmetry. The earth spins, and this 
does affect the stress-energy tensor, but since the velocity with 
which it spins is everywhere much smaller than c, the resulting 
effect, called frame dragging, is much smaller. 


b/The cone has zero _intrin- 
sic curvature everywhere except 
at its tip. An observer who never 
visits the tip can nevertheless 
detect its existence, because 
parallel transport around a path 
that encloses the tip causes a 
vector to change its direction. 
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a/in Asteroids, space “wraps 
around.” 


b/A coffee cup is topologically 
equivalent to a torus. 


0 Manifolds 
This section can be omitted on a first reading. 


5.10.1 Why we need manifolds 


General relativity doesn’t assume a predefined background met- 
ric, and this creates a chicken-and-egg problem. We want to define 
a metric on some space, but how do we even specify the set of points 
that make up that space? The usual way to define a set of points 
would be by their coordinates. For example, in two dimensions we 
could define the space as the set of all ordered pairs of real numbers 
(x,y). But this doesn’t work in general relativity, because space is 
not guaranteed to have this structure. For example, in the classic 
1979 computer game Asteroids, space “wraps around,” so that if 
your spaceship flies off the right edge of the screen, it reappears 
on the left, and similarly at the top and bottom. Even before we 
impose a metric on this space, it has topological properties that dif- 
fer from those of the Euclidean plane. By “topological” we mean 
properties that are preserved if the space is thought of as a sheet 
of rubber that can be stretched in any way, but not cut or glued 
back together. Topologically, the space in Asteroids is equivalent to 
a torus (surface of a doughnut), but not to the Euclidean plane. 


Another useful example is the surface of a sphere. In example 
11 on page 189, we calculated TV? bo" A similar calculation gives 


i; ¢ = cot 6/R. Now consider what happens as we drive our dogsled 
north along the line of longitude ¢ = 0, cross the north pole at 
0? = 0, and continue along the same geodesic. As we cross the pole, 
our longitude changes discontinuously from 0 to 7. Consulting the 
geodesic equation, we see that this happens because ee blows up 
at 6 = 0. Of course nothing really special happens at the pole. 
The bad behavior isn’t the fault of the sphere, it’s the fault of the 
(@,@) coordinates we’ve chosen, that happen to misbehave at the 
pole. Unfortunately, it is impossible to define a pair of coordinates 
on a two-sphere without having them misbehave somewhere. (This 
follows from Brouwer’s famous 1912 “Hairy ball theorem,” which 
states that it is impossible to comb the hair on a sphere without 
creating a cowlick somewhere.) 
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5.10.2 Topological definition of a manifold 


This motivates us to try to define a “bare-bones” geometrical 
space in which there is no predefined metric or even any predefined 
set of coordinates. 


There is a general notion of a topological space, which is too 
general for our purposes. In such a space, the only structure we are 
guaranteed is that certain sets are defined as “open,” in the same 
sense that an interval like 0 < x < 1 is called “open.” A point in 
an open set can be moved in any direction without leaving the set. 
An open set is essentially a set without a boundary, for in a set like 
0 <a <1, the boundary points 0 and 1 can only be moved in one 
direction without taking them outside. 


A topological space is too general for us because it can include 
spaces like fractals, infinite-dimensional spaces, and spaces that have 
different numbers of dimensions in different regions. It is neverthe- 
less useful to recognize certain concepts that can be defined using 
only the generic apparatus of a topological space, so that we know 
they do not depend in any way on the presence of a metric. An 
open set surrounding a point is called a neighborhood of that point. 
In a topological space we have a notion of getting arbitrarily close 
to a certain point, which means to take smaller and smaller neigh- 
borhoods, each of which is a subset of the last. But since there is 
no metric, we do not have any concept of comparing distances of 
distant points, e.g., that P is closer to Q than R is to S. A con- 
tinuous function is a purely topological idea; a continuous function 
is one such that for any open subset U of its range, the set V of 
points in its domain that are mapped to points in U is also open. 
Although some definitions of continuous functions talk about real 
numbers like € and 6, the notion of continuity doesn’t depend on 
the existence of any structure such as the real number system. A 
homeomorphism is a function that is invertible and continuous in 
both directions. Homeomorphisms formalize the informal notion of 
“rubber-sheet geometry without cutting or gluing.” If a homeomor- 
phism exists between two topological spaces, we say that they are 
homeomorphic; they have the same structure and are in some sense 
the same space. 


The more specific type of topological space we want is called a 
manifold. Without attempting any high level of mathematical rigor, 
we define an n-dimensional manifold M according to the following 
informal principles:'° 


'6For those with knowledge of topology, these can be formalized a little more: 
we want a completely normal, second-countable, locally connected topological 
space that has Lebesgue covering dimension n, is a homogeneous space under 
its own homeomorphism group, and is a complete uniform space. I don’t know 
whether this is sufficient to characterize a manifold completely, but it suffices to 
rule out all the counterexamples of which I know. 


topological spaces 


manifolds 


manifolds 
with metrics 


c/General relativity doesn’t 
assume a predefined background 
metric. Therefore all we can 
really know before we calculate 
anything is that we’re working 
on a manifold, without a metric 
imposed on it. 
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M1 Dimension: M’s dimension is n. 


M2 Homogeneity: No point has any locally definable property that 
distinguishes it from any other point. 


M3 Completeness: M is complete, in the sense that specifying an 
arbitrarily small neighborhood gives a unique definition of a 
point. 


Lines Example: 13 
The set of all real numbers is a 1-manifold. Similarly, any line with 
the properties specified in Euclid’s Elements is a 1-manifold. All 
such lines are homeomorphic to one another, and we can there- 
fore speak of “the line.” 


A circle Example: 14 
A circle (not including its interior) is a 1-manifold, and it is not 
homeomorphic to the line. To see this, note that deleting a point 
from a circle leaves it in one connected piece, but deleting a point 
from a line makes two. Here we use the fact that a homeomor- 
phism is guaranteed to preserve “rubber-sheet” properties like the 
number of pieces. 


No changes of dimension Example: 15 
A “lollipop” formed by gluing an open 2-circle (i.e., a circle not 
including its boundary) to an open line segment is not a manifold, 
because there is no rn for which it satisfies M1. 


It also violates M2, because points in this set fall into three distinct 
classes: classes that live in 2-dimensional neighborhoods, those 
that live in 1-dimensional neighborhoods, and the point where the 
line segment intersects the boundary of the circle. 


No manifolds made from the rational numbers Example: 16 
The rational numbers are not a manifold, because specifying an 
arbitrarily small neighborhood around V2 excludes every rational 
number, violating M3. 


Similarly, the rational plane defined by rational-number coordinate 
pairs (x,y) is not a 2-manifold. It’s good that we’ve excluded 
this space, because it has the unphysical property that curves 
can cross without having a point in common. For example, the 
curve y = x* crosses from one side of the line y = 2 to the other, 
but never intersects it. This is physically undesirable because it 
doesn’t match up with what we have in mind when we talk about 
collisions between particles as intersections of their world-lines, 
or when we say that electric field lines aren’t supposed to inter- 
sect. 


No boundary Example: 17 
The open half-plane y > 0 in the Cartesian plane is a 2-manifold. 
The closed half-plane y > 0 is not, because it violates M2; the 
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boundary points have different properties than the ones on the 
interior. 


Disconnected manifolds Example: 18 
Two nonintersecting lines are a 1-manifold. Physically, discon- 
nected manifolds of this type would represent a universe in which 
an observer in one region would never be able to find out about 
the existence of the other region. 


No bad glue jobs Example: 19 
Hold your hands like you’re pretending you know karate, and then 
use one hand to karate-chop the other. Suppose we want to join 
two open half-planes in this way. As long as they’re separate, 
then we have a perfectly legitimate disconnected manifold. But if 
we want to join them by adding the point P where their boundaries 
coincide, then we violate M2, because this point has special prop- 
erties not possessed by any others. An example of such a prop- 
erty is that there exist points Q and R such that every continuous 
curve joining them passes through P. (Cf. problem 5, p. 367.) 


fo] 


5.10.3 Hausdorff property 


Pioneering topologist Felix Hausdorff defined the following prop- 
erty of a topological space: 


Hausdorff property: Given any two points, it is possible to find 
disjoint neighborhoods of them. 


A joke/mnemonic, which probably works best for people with a 
certain type of British accent, is that in a Hausdorff space, any 
two points can be “housed off” from one another inside their own 
nonintersecting open sets. The notion appeals strongly to our in- 
tuitive ideas about how space and time behave, and the standard 
definition of a manifold implies that it is Hausdorff. When we 
model Minkowski space using real-number coordinates, it is Haus- 
dorff. Since the equivalence principle says that spacetime is locally 
Minkowski, we could also say that it implies spacetime is Haus- 
dorff. However, general relativity allows spacetime to behave badly 
in cases such as singularities, so it is imaginable that our universe 
contains points that violate the Hausdorff property. There are in- 
teresting and physically well-motivated spacetimes, such as some 
versions of the Taub-NUT space, that are non-Hausdorff. Since we 
have no empirical data on the behavior of spacetime under the most 
extreme conditions, we cannot say whether spacetime is really Haus- 
dorff. One should maintain some skepticism about whether such an 
idealized category is even meaningful in science, since it refers to 
phenomena at arbitrarily small scales, whereas theories and mea- 
surements are limited in the scales they can deal with. A good 
discussion of the Hausdorff property as applied to relativity is given 
by Earman.!7 


'” John Earman, ‘Pruning some branches from “branching spacetimes” ,’ pitt. 
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Py 


d/ Example 20. 


Branching universes Example: 20 
Figure d shows spacetime diagrams of 1 + 1-dimensional uni- 
verses that branch like a tree. These are meant to be pictures 
of classical general relativity, although some of the strongest mo- 
tivation for considering such possibilities comes from attempts to 
construct a theory of quantum gravity. In such theories, it is com- 
monly expected that spacetime will have a structure at the Planck 
scale that is a kind of “quantum foam.” 


The example in d/1 is a manifold, and is Hausdorff. This is an 
example of topology change, meaning that the spacelike section 
at one time has a different topology than the section at another.!° 
Although such a branching can occur without the existence of 
any singularities, theorems by Tipler and Geroch show that other 
types of misbehavior must occur, including causality violations 
and the need for forms of matter that violate energy conditions. 


Figure d/2 is qualitatively different. Here we have formed a space- 
time by gluing together three pieces. No curvature is implied; 
these could be three pieces of Minkowski space. The spacetime 
is not a manifold, since the points at the join have different local 
properties than points elsewhere. The machinery of general rel- 
ativity breaks down in a case like this, but for example we could 
imagine that a geodesic in this spacetime could fork off into two 
different geodesics after the split. 


Yet a third possibility is to reinterpret d/2 so that there are two 
different copies of the seam. For example, we could let the portion 
of the diagram extending into the past be represented by points 
with t < 0, while the two branches continuing into the future could 
each have t > 0, so that for a given x we would have two different 
events with coordinates (t = 0, x). It would not be possible to put 
these two points into disjoint neighborhoods, so this version of the 
space is not Hausdorff. 


4 Local-coordinate definition of a manifold 


An alternative way of characterizing an n-manifold is as an ob- 


ject that can locally be described by n real coordinates. That is, 
any sufficiently small neighborhood is homeomorphic to an open set 
in the space of real-valued n-tuples of the form (21, 272,...,2n). For 
example, a closed half-plane is not a 2-manifold because no neigh- 
borhood of a point on its edge is homeomorphic to any open set in 
the Cartesian plane. 


Self-check: Verify that this alternative definition of a manifold 


gives the same answers as M1-MB3 in all the examples above. 


Roughly speaking, the equivalence of the two definitions occurs 


edu/~ jearman/Earman2008a. pdf 


'8For a recent treatment, see Borde, 1994, “Topology change in classical gen- 


eral relativity,” arxiv.org/abs/gr-qc/9406053. 
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because we’re using n real numbers as coordinates for the dimensions 
specified by M1, and the real numbers are the unique number system 
that has the usual arithmetic operations, is ordered, and is complete 
in the sense of M3. 


As usual when we say that something is “local,” a question arises 
as to how local is local enough. The language in the definition 
above about “any sufficiently small neighborhood” is logically akin 
to the Weierstrass «-d approach: if Alice gives Bob a manifold and 
a point on a manifold, Bob can always find some neighborhood 
around that point that is compatible with coordinates, but it may 
be an extremely small neighborhood. 


Coordinates on a circle Example: 21 
If we are to define coordinates on a circle, they should be contin- 
uous functions. The angle @ about the center therefore doesn’t 
quite work as a global coordinate, because it has a discontinuity 
where ¢ = 0 is identified with ¢@ = 27. We can get around this by 
using different coordinates in different regions, as is guaranteed 
to be possible by the local-coordinate definition of a manifold. For 
example, we can cover the circle with two open sets, one on the 
left and one on the right. The left one, L, is defined by deleting 
only the = 0 point from the circle. The right one, R, is defined 
by deleting only the one at @ = mz. On_L, we use coordinates 
0 < $, < 27, which are always a continuous function from L to 
the real numbers. On R, we use —71 < ha < 7. 


In examples like this one, the sets like L and R are referred to 
as patches. We require that the coordinate maps on the different 
patches match up smoothly. In this example, we would like all 
four of the following functions, known as transition maps, to be 
continuous: 


e gy as a function of dg on the domain 0 < @p <7 
e dy, as a function of dg on the domain —7 < dr < 0 
e dr as a function of dy on the domain 0 < ¢, <7 


e dr as a function of dy on the domain 7 < gy, < 27 


The local-coordinate definition only states that a manifold can 
be coordinatized. That is, the functions that define the coordinate 
maps are not part of the definition of the manifold, so, for example, 
if two people define coordinates patches on the unit circle in different 
ways, they are still talking about exactly the same manifold. 


Open line segment homeomorphic to a line Example: 22 
Let L be an open line segment, such as the open interval (0,1). L 
is homeomorphic to a line, because we can map (0, 1) to the real 
line through the function f(x) = tan(ax — 7/2). 
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e / Example 24. 
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f/“Bob, your manifold isn’t 
smooth!” 


Closed line segment not homeomorphic to a line Example: 23 
A closed line segment (which is not a manifold) is not homeomor- 
phic to a line. If we map it to a line, then the endpoints have to 
go to two special points A and B. There is then no way for the 
mapping to visit the points exterior to the interval [A, B] without 
visiting A and B more than once. 


Open line segment not homeomorphic to the interior of a circle 
Example: 24 

If the interior of a circle could be mapped by a homeomorphism f 
to an open line segment, then consider what would happen if we 
took a closed curve lying inside the circle and found its image. By 
the intermediate value theorem, f would not be one-to-one, but 
this is a contradiction since f was assumed to be a homeomor- 
phism. This is an example of a more general fact that homeomor- 
phism preserves the dimensionality of a manifold. 


5.10.5 Differentiable manifolds 


A differentiable manifold means a manifold with enough extra 
structure so you can do calculus on it, but this extra structure 
doesn’t necessarily include anything as fancy as a metric. As a 
concrete example, suppose that in a 1+ 1-dimensional Galilean uni- 
verse, observer Alice constructs a global coordinate system (t,x). 
Her spacetime is clearly a manifold, based on the local-coordinate 
definition, and this is true even though Galilean spacetime doesn’t 
have a metric. Meanwhile, observer Bob constructs his own coordi- 
nate system (t’,2’). But something disturbing happens when Alice 
constructs the transition map from Bob’s coordinate grid to hers. 
As shown in figure f, Bob’s grid has a kink in it. “Bob,” says Alice, 
“something is wrong with your coordinate system. I hypothesize 
that at a certain time, which we can call t = 0, an invisible gi- 
ant struck your body with an invisible croquet mallet and suddenly 
changed your state of motion.” “No way, Alice,” Bob answers. “I 
didn’t feel anything happen at t = 0. I think you’re the one who 
got whacked.” 


By a differentiable manifold we mean one in which this sort of 
controversy never happens. The manifold comes with an a collection 
of local coordinate systems, called charts, and wherever these charts 
overlap, the transition map is differentiable. Every coordinate is a 
differentiable function of every other coordinate. In fact, we will as- 
sume for convenience that not just the first derivative but derivatives 
of all orders are defined. This makes our manifold not just a dif- 
ferentiable manifold but a smooth manifold. This definition sounds 
coordinate-dependent, but it isn’t. Our collection of charts (called 
an atlas) can contain infinitely many possible coordinate systems; 
we can even specify that it contains all possible coordinate systems 
that could be obtained from one another by any diffeomorphism. 
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5.10.6 The tangent space 


We now formalize the intuitive notion of a tangent vector (p. 88), 
following Nowik and Katz.!® Let M be an n-dimensional smooth 
manifold, so that locally it looks like Euclidean space, describable 
by real-number coordinates x, y, ... We now enhance M to form a 
new topological space, in which the coordinates can include not only 
real numbers, but numbers that differ infinitesimally from reals, as 
outlined in example 3 on p. 94. From now on when we say things 
like “the manifold,” we mean this enhanced version.2? Fix some 
infinitesimal number e¢ for once and for all, and define the notation 
x = O(e) to mean that x/e is not infinite.?! 


Points in the manifold are considered close if the Euclidean dis- 
tance between them in coordinate space is O(e). This definition 
sounds coordinate-dependent, but isn’t, and sounds like it’s assum- 
ing an actual Euclidean metric, but isn’t.2? Define a prevector at 
point P as a pair (P,Q) of points that are close, figure g/1. De- 
fine prevectors to be equivalent if the difference between them is 
infinitesimal even compared to e. 


Definition: A tangent vector at point P is the set of all prevectors 
at P that are equivalent to a particular prevector at P. 


The tangent space Tp is the set of all tangent vectors at P. The 
tangent space has the structure of a vector space over the reals 
simply by using the coordinate differences to define the vector-space 
operations, just as we would do if (P,Q) meant an arrow extending 
from P to Q, as in freshman physics. 


In practice, we don’t really care about the details of the con- 
struction of the tangent space, and different people don’t even have 
to use the same construction. All we care about is that the tangent 
space has a certain structure. In particular, it has n dimensions, as 
we would expect intuitively. Since we’re going to forget the details 
of the construction, it doesn’t matter that we’ve made all tangent 
vectors infinitesimal by definition. The vector space’s internal struc- 


'9 «Differential geometry via infinitesimal displacements,” arxiv.org/abs/ 
1405 .0984 

?°Tt is possible to define a different and larger enhancement, called *M, that 
would include points with infinitely large coordinates. For example, suppose we 
have a coordinate patch with bounds on the coordinates that can be written 
down using inequalities, t > 0, 0 < 6 < 7/4, ... Then *M would contain any 
finite, infinitesimal, and infinite values of (t,0,...) satisfying these inequalities, 
and this would include infinite values of t. We will not do this here, because 
the inclusion of idealized points at infinity is more useful in relativity if we do it 
using a different approach, discussed in section 7.3.4, p. 274. 

21 As usual in this type of “big O” notation, we abuse the equals sign somewhat. 
In particular, the equals sign here is not symmetric. For more detail, see the 
Wikipedia article “Big O notation.” 

22 An equivalent and manifestly coordinate-independent definition is that for 
every smooth real function in a neighborhood of the points, the function differs 
at these points by an amount that is O(e). 


g/1. A tangent vector can 
be thought of as an infinitesimal 
displacement. 2. For a sphere 
embedded in three-space, the 
space of tangent vectors is visu- 
alized as a plane tangent to the 
sphere at a certain point. 
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ture only has to do with how big the vectors are compared to each 
other. (If we wanted to, we could scale up all the tangent vectors 
by a factor of 1/e.) This justifies the visualization in figure g/2. 


Actually it’s not quite true that we only care about the tangent 
space’s internal structure, because then we could have avoided the 
fancy definition and simply used the ordinary vector space consisting 
of n-tuples of real numbers. The fancy definition is needed because 
it ties the tangent space in a natural way to the structure of the 
manifold at a particular point. Therefore it will allow us (1) to define 
parallel transport, which brings a vector from one tangent space to 
another, and (2) to define components of vectors in a particular 
coordinate system. 


For an alternative definition of the tangent space, see ch. 2 of 
Carroll.?? Briefly, this involves taking a tangent vector to be some- 
thing that behaves like a directional derivative. In particular, a 
partial derivative with respect to a coordinate such as 0/Ox qual- 
ifies as a tangent vector, which we think of as pointing in the x 
direction. The set of such coordinate derivatives forms a basis for 
the tangent space and gives a convenient way of notating tangent 
vectors. We will find this notation convenient in section 7.1, p. 261. 


5.11 Units in general relativity 
This section is optional. 


Analyzing units, also known as dimensional analysis, is one of 
the first things we learn in freshman physics. It’s a useful way of 
checking our math, and it seems as though it ought to be straight- 
forward to extend the technique to relativity. It certainly can be 
done, but it isn’t quite as trivial as might be imagined, and it leads 
to some surprising new physical ideas. 


One of our most common jobs is to change from one set of units 
to another, but in relativity it becomes nontrivial to define what we 
mean by the notion that our units of measurement change or don’t 
change. We could, e.g., appeal to an atomic standard, but Dicke?4 
points out that this could be problematic. Imagine, he says, that 


you are told by a space traveller that a hydrogen atom on 
Sirius has the same diameter as one on the earth. A few 
moments’ thought will convince you that the statement 
is either a definition or else meaningless. 


To start with, we note that abstract index notation is more con- 
venient than concrete index notation for these purposes. Concrete 


?3Lecture Notes on General Relativity, http://ned.ipac.caltech.edu/ 
level5/March01/Carrol113/Carroll_contents.html. 

24“Nach’s principle and invariance under transformation of units,” Phys Rev 
125 (1962) 2163 
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index notation assigns different units to different components of a 
tensor if we use coordinates, such as spherical coordinates (t,r, 0, ¢), 
that don’t all have units of length. In abstract index notation, a 
symbol like v’ stands for the whole vector, not for one of its compo- 
nents. 


In concrete index notation, it also doesn’t necessarily make sense 
to talk about rescaling. E.g., for polar coordinates in the Euclidean 
plane, the transformation (r,0@) — (2r,20) doesn’t have any inter- 
esting interpretation, and can’t even be applied globally. In abstract 
index notation, we can say v’ — 2v’, and this simply means that 
the vector v’ has been scaled up by a factor of 2. 


Since abstract index notation does not even offer us a notation 
for components, if we want to apply dimensional analysis we must 
define a system in which units are attributed to a tensor as a whole. 
Suppose we write down the abstract-index form of the equation for 
proper time: 

ds” = gqp dx* dx* 


In abstract index notation, dz® doesn’t mean an infinitesimal change 
in a particular coordinate, it means an infinitesimal displacement 
vector.2° This equation has one quantity on the left and three fac- 
tors on the right. Suppose we assign these parts of the equation 
units [ds] = L’, [gas] = L?7, and [dx] = [dx’] = L‘, where square 
brackets mean “the units of” and L stands for units of length. We 
then have o = y+ €. Due to the ambiguities referred to above, we 
can pick any values we like for these three constants, as long as they 
obey this rule. I find (0, y,§) = (1,0,1) to be natural and conve- 
nient, but Dicke, in the above-referenced paper, likes (1, 1,0), while 
the mathematician Terry Tao advocates (0, 1,+1). 


Suppose we raise and lower indices to form a tensor with r upper 
indices and s lower indices We refer to this as a tensor of rank 
(r,s). (We don’t count contracted indices, e.g., wvq is a rank-(0, 0) 
scalar.) Since the metric is the tool we use for raising and lowering 
indices, and the units of the lower-index form of the metric are L?7, 
it follows that the units vary in proportion to L7Y“~"). In general, 
you can assign a physical quantity units L“ that are a product of 
two factors, a “kinematical” or purely geometrical factor L*, where 
k = 7(s —r), and a dynamical factor L¢..., which can depend on 
what kind of quantity it is, and where the ...indicates that if your 
system of units has more than just one base unit, those can be in 
there as well. Dicke uses units with h = c = 1, for example, so 
there is only one base unit, and mass has units of inverse length and 
dmass = —1. In general relativity it would be more common to use 
units in which G = c = 1, which instead give dmass = +1. 


The units of momentum Example: 25 


25For a modern and rigorous development of differential geometry along these 
lines, see Nowik and Katz, arxiv.org/abs/1405 .0984. 
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Consider the equation 
p? = mv 


for the momentum of a material particle. Suppose we use special- 
relativistic units in which c = 1, but because gravity isn’t incorpo- 
rated into the theory, G plays no special role, and it is natural to 
use a system of units in which there is a base unit of mass M. 


The kinematic units check out, because Kp = Km + ky: 


y(—1) = y(0) + y(—1) 


This is merely a matter of counting indices, and was guaranteed 
to check out as long as the indices were written in a grammatical 
way on both sides of the equation. What this check is essentially 
telling us is that if we were to establish Minkowski coordinates in 
a neighborhood of some point, and do a change of coordinates 
(t,x,Y,Z) > (at, xx, xy, xz), then the quantities on both sides 
of the equation would vary under the tensor transformation laws 
according to the same exponent of «. For example, if we changed 
from meters to centimeters, the equation would still remain valid. 


For the dynamical units, suppose that we use (o,y, &) = (1,0, 1), 
so that an infinitesimal displacement dx? has units of length L, as 
does proper time ds. These two quantities are purely kinematic, 
so we don’t assign them any dynamical units, and therefore the 
velocity vector v? = dx*/ds also has no dynamical units. Our 
choice of a system of units gives [m] = M. We require that the 
equation p? = mv? have dynamical units that check out, so: 


M=1-M 


We must also assign units of mass to the momentum. 


A system almost identical to this one, but with different termi- 
nology, is given by Schouten.?° 


For practical purposes in checking the units of an equation, we 
can see from example 25 that worrying about the kinematic units 
is a waste of time as long as we have checked that the indices are 
grammatical. We can therefore give a simplified method that suffices 
for checking the units of any equation in abstract index notation. 


1. We assign a tensor the same units that one of its concrete 
components would have if we were to adopt (local) Minkowski 
coordinates, in the system with (0, 7,§) = (1,0,1). These 
are the units we would automatically have imputed to it after 
learning special relativity but before learning about tensors or 
fancy coordinate transformations. Since y = 0, the positions 
of the indices do not affect the result. 


?6Tensor Analysis for Physicists, ch. VI 
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2. The units of a sum are the same as the units of the terms. 


3. The units of a tensor product are the product of the units of 
the factors. 


Our splitting of units into kinematic and dynamical parts can be 
understood as arising naturally from the following geometrical and 
physical considerations. In section 3.2.3, p. 90, we introduced the 
notion of a connection, which is a rule that relates tensors living in 
one local region of spacetime to those in another region, depending 
on the path used for parallel transport. The connection is embod- 
ied concretely in the Christoffel symbols, and we need it in order 
to define sensible derivatives of vectors, because otherwise we lack 
the information needed in order to tell whether a vector is in fact 
constant, and only changing its components due to the way the co- 
ordinate system is defined. The connection and the metric embody 
a lot of the same geometrical information. If we know the metric, 
we can always find the connection (sec. 5.9.1, p. 188). 


We might then naturally ask whether it is possible to go in the 
other direction. Given the connection, can we find the metric? But 
this is clearly not true, because the connection doesn’t carry any 
information about units of measurement, while the metric does. In 
fact, if the metric g results in a certain connection I, then so will 
the metric 27g, where Q is a real constant.2” One way of think- 
ing about the transformation g > 7g is that in the expression 
ds? = gap dx* dx* for proper time, we scale up any clock reading s 
by a factor of 2. This helps to explain Dicke’s preference for the 
convention (0, y,§) = (1,1,0), according to which the units are at- 
tributed to ds and g, while vectors are considered to be unitless. A 
further advantage of this system is that it can be adapted to con- 
crete index notation, because we simply declare coordinates to be 
unitless names for points. 


27If we multiplied g by a negative constant, then we would change the signa- 
ture, e.g., from +——— to -++-+. Changing the signature would be particularly 
goofy in the context of Riemannian geometry, where it is customary to have a 
positive-definite metric. 
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The following table summarizes the factors by which various 
quantities change under rescaling of the lower-index metric and 
rescaling of local Minkowski coordinates 2". As above, r is the 
number of upper indices and s the number of lower indices. Entries 
in lighter text follow from the more general rule. A curvature mono- 
mial of order p is an expression formed from the multiplication of p 
curvature tensors, possibly with contracted indices. 


gab > 27 gap | 2 > at 
g Oeen a’ 
tensor density of rank (r,s) and qos 
weight w 
are 1 aw! 
curvature monomial of order p QP a’ * 


It makes sense that rescaling the metric doesn’t change the Christof- 
fel symbols, because it doesn’t change the connection or the coordi- 
nates, and therefore shouldn’t change the geodesic equation. Veri- 
fying the other entries in the table is a good exercise. 


A change of signature Example: 26 
Suppose that we change the signature of a metric from + — —— 
to — + ++ or vice versa. Although the notation 0? was intended 
to imply that the signature of the metric would not be changed, 
nothing goes wrong in the logic if we take Q? = —1. According 
to the table, the lower-index form of the metric, with (r,s) = (0, 2) 
changes by a factor of —1, which is what we set out to do. A 
curvature polynomial of order p changes by a factor of (—1)?. As 
a specific example, a cosmological model dominated by the cos- 
mological constant (sec. 8.2.7, p. 341) has Ricci scalar R = —12/A 
in the + — —— signature used in this book, but R = +12A in the 
— +++ signature. 


Curvature scalars for the Godel metric Example: 27 
The Ricci scalar R = A, is a curvature monomial of order 1. 
Because it is a relativistic scalar, its value is invariant under a 
change of coordinates. A scalar constructed in this way from a 
curvature tensor is called a curvature scalar. In the system de- 
scribed above, it is a curvature monomial of order 1, and itis a 
tensor of rank (0,0). It is a pure tensor, i.e., it is a tensor density 
in only the trivial sense, having weight w = 0. 


The Kretschmann invariant K = R?°°7R.4.q, discussed in more 
detail on p. 236, is a curvature monomial of order 2, with prop- 
erties that are otherwise similar to the ones listed above for the 
Ricci scalar. 


To have a specific example to talk about, let us consider the metric 
1 
ds? = dt® — dx? — dy? + ae. dz* — 2e* dzdt. 


This is the historically and philosophically important Godel met- 
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ric, discussed on p. 325. A calculation using Maxima gives R = 1 
(+ — —— signature) and K = 3. (The fact that both of these are 
constant shows that the spacetime is highly symmetric, although 
this is not manifest when the metric is expressed in these coor- 
dinates.) Suppose that we recalibrate our clocks to use differ- 
ent units, changing the metric above according to ds* > 0? ds?. 
Then application of the rules given in the table tells us that R = 
Q-? and K = 30-4. 


To round out our discussion of this approach, we state more 
precisely the relationship between the metric and the connection. 
Given a metric, there is a unique torsion-free connection. Given a 
torsion-free connection, there may or may not exist a metric that 
gives rise to that connection. If such a metric does exist, then except 
in exceptional cases that metric is unique up to a nonzero multiplica- 
tive constant. The reason for the uniqueness of the metric up to a 
constant factor is as follows. Suppose we fix the metric at one point 
on our manifold. Then by using the connection we can parallel- 
transport the metric tensor to other points on the manifold, so that 
defining it at one point has the effect of defining it everywhere. But 
there may be a lack of consistency, because parallel transport is 
path-dependent. In particular, if we transport the metric around 
a closed loop, we want to recover the original metric. This con- 
sistency requirement is usually enough to rule out any freedom in 
defining the metric beyond a global scaling factor. A more complete 
treatment of this problem is given by Schmidt.?° 


An interesting exceptional case is flat spacetime. Because there 
is no curvature, parallel transport around a closed loop never changes 
the metric, so the consistency requirement is automatically satisfied, 
and we our freedom in choosing a metric is greater than just the 
ability to scale by a constant. In particular, some authors choose 
not to use natural units, so that instead of g = diag(1, —1, —1, —1) 
in Cartesian coordinates, one has g = diag(c?,—1,—-1,—1). In 
an approach where a change of units is represented by a change 
of coordinates, this change in the metric could be represented by 
(t,x, y,z) — (t/c, x,y,z). But in the convention followed by Dicke, 
we would take the coordinates to be immutable labels for points, and 
these would actually be physically different metrics, with different 
light cones. 


A similar example in a Riemannian context is the Euclidean 
plane, in which the (trivial) connection is consistent any metric of 
the form given in example 9, p. 104. 


Finally, we note that it can be of interest to generalize the trans- 
formation g + 0g so that 2 can vary from point to point. This is 
called a conformal transformation. Conformal transformations can 
be used for a variety of purposes, including nontrivial physics (as in 


*8projecteuclid.org/euclid.cmp/1103858479 
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the Dicke paper) and techniques for visualization (sec. 7.3.4, p. 274). 
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Problems 


1 Suppose that we change the metric by a nonzero constant 
factor, g > ag. We do not rule out a < 0, in which case the signa- 
ture of the metric changes. Determine the effect on the Christoffel 
symbols and on the geodesic equation, and explain why this makes 
sense. > Solution, p. 392 


2 Show, as claimed on page 189, that for polar coordinates in 
a Euclidean plane, I, = —r and ine =e 


3 In 1+1 dimensions, let the metric be ds? = + dé? = +e", 
where @ is an angle running around the circle. Calculate all the 
nonvanishing Christoffel symbols by hand. These will be used in 
example 4 on p. 246, where we investigate some further properties 
of this interesting spacetime. > Solution, p. 392 


4 Partial derivatives commute with partial derivatives. Co- 
variant derivatives don’t commute with covariant derivatives. Do 
covariant derivatives commute with partial derivatives? 


5 Show that if the differential equation for geodesics on page 
179 is satisfied for one affine parameter A, then it is also satisfied for 
any other affine parameter \’ = a\+0, where a and b are constants. 


6 Equation [2] on page 111 gives a flat-spacetime metric in 
rotating polar coordinates. (a) Verify by explicit computation that 
this metric represents a flat spacetime. (b) Reexpress the metric in 
rotating Cartesian coordinates, and check your answer by verifying 
that the Riemann tensor vanishes. 


7 The purpose of this problem is to explore the difficulties 
inherent in finding anything in general relativity that represents a 
uniform gravitational field g. In example 11 on page 58, we found, 
based on elementary arguments about the equivalence principle and 
photons in elevators, that gravitational time dilation must be given 
by e®, where ® = gz is the gravitational potential. This results in 
a metric 


[1] ds? = 679 dt? — dz?. 
On the other hand, example 19 on page 140 derived the metric 
[2] ds* = (1+ gz)? dt? — dz”. 


by transforming from a Lorentz frame to a frame whose origin moves 
with constant proper acceleration g. (These are known as Rindler 
coordinates.) Prove the following facts. None of the calculations 
are so complex as to require symbolic math software, so you might 
want to perform them by hand first, and then check yourself on a 
computer. 

(a) The metrics [1] and [2] are approximately consistent with one 
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another for z near 0. 

(b) When a test particle is released from rest in either of these met- 
rics, its initial proper acceleration is g. 

(c) The two metrics are not exactly equivalent to one another under 
any change of coordinates. 

(d) Both spacetimes are uniform in the sense that the curvature is 
constant. (In both cases, this can be proved without an explicit 
computation of the Riemann tensor.) 


Remark: The incompatibility between [1] and [2] can be interpreted as showing 
that general relativity does not admit any spacetime that has all the global 
properties we would like for a uniform gravitational field. This is related to Bell’s 
spaceship paradox (example 15, p. 65). Some further properties of the metric [1] 
are analyzed in subsection 7.5 on page 285. > Solution, p. 393 


8 In a topological space T, the complement of a subset U is 
defined as the set of all points in T that are not members of U. A 
set whose complement is open is referred to as closed. On the real 
line, give (a) one example of a closed set and (b) one example of 
a set that is neither open nor closed. (c) Give an example of an 
inequality that defines an open set on the rational number line, but 
a closed set on the real line. 


9 Prove that a double cone (e.g., the surface r = z in cylindrical 
coordinates) is not a manifold. > Solution, p. 393 
10 Prove that a torus is a manifold. > Solution, p. 393 
11 Prove that a sphere is not homeomorphic to a torus. 


> Solution, p. 394 


12 Curvature on a Riemannian space in 2 dimensions is a 
topic that goes back to Gauss and has a simple interpretation: the 
only intrinsic measure of curvature is a single number, the Gaussian 
curvature. What about 1+1 dimensions? The simplest metrics I 
can think of are of the form ds? = dt? — f(t)dx?. (Something like 
ds? = f(t)dt? dz? is obviously equivalent to Minkowski space under 
a change of coordinates, while ds? = f(x)dt? — dx? is the same as 
the original example except that we’ve swapped x and t.) Playing 
around with simple examples, one stumbles across the seemingly 
mysterious fact that the metric ds? = dt? — t?dx? is flat, while ds? = 
dt? — tdx? is not. This seems to require some simple explanation. 
Consider the metric ds? = dt? — t?dx?. 

(a) Calculate the Christoffel symbols by hand. 

(b) Use a computer algebra system such as Maxima to show that 
the Ricci tensor vanishes only when p = 2. 

Remark: The explanation is that in the case p = 2, the x coordinate is expanding 
in proportion to the t coordinate. This can be interpreted as a situation in which 
our length scale is defined by a lattice of test particles that expands inertially. 
Since their motion is inertial, no gravitational fields are required in order to 


explain the observed change in the length scale; cf. the Milne universe, p. 332. 


> Solution, p. 394 
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13 Example 6 on p. 167 discussed some examples in electro- 
statics where the charge density on the surface of a conductor de- 
pends on the Gaussian curvature, when the curvature is positive. 
In the case of a knife-edge formed by two half-planes at an exterior 
angle 6 > 7, there is a standard result? that the charge density 
at the edge blows up to infinity as R7/°-!. Does this match up 
with the hypothesis that Gaussian curvature determines the charge 
density? > Solution, p. 394 


14 Suppose that we have found a solution 2(A) of the geodesic 
equation for a timelike geodesic, but A is not the proper time. How 
can we relate \ to proper time? > Solution, p. 394 


?° Jackson, Classical Electrodynamics 


Problems 


211 


212 Chapter5 Curvature 


Chapter 6 
Vacuum Solutions 


In this chapter we investigate general relativity in regions of space 
that have no matter to act as sources of the gravitational field. 
We will not, however, limit ourselves to calculating spacetimes in 
cases in which the entire universe has no matter. For example, 
we will be able to calculate general-relativistic effects in the region 
surrounding the earth, including a full calculation of the geodetic 
effect, which was estimated in section 5.5.1 only to within an order 
of magnitude. We can have sources, but we just won’t describe the 
metric in the regions where the sources exist, e.g., inside the earth. 
The advantage of accepting this limitation is that in regions of empty 
space, we don’t have to worry about the details of the stress-energy 


tensor or how it relates to curvature. As should be plausible based a /A Swiss commemorative 
on the physical motivation given in section 5.1, page 160, the field coin shows the vacuum field 
equations in a vacuum are simply Ray = 0. equation. 


6.1 Event horizons 


One seemingly trivial way to generate solutions to the field equations 
in vacuum is simply to start with a flat Lorentzian spacetime and do 
a change of coordinates. This might seem pointless, since it would 
simply give a new description (and probably a less convenient and 
descriptive one) of the same old, boring, flat spacetime. It turns 
out, however, that some very interesting things can happen when 
we do this. 


6.1.1 The event horizon of an accelerated observer 


Consider the uniformly accelerated observer described in exam- 
ples 4 on page 126 and 19 on page 140. Recalling these earlier results, 
we have for the ship’s equation of motion in an inertial frame 


1 
—_ (vi b a2t? ls 
a 


and for the metric in the ship’s frame 


Guy = (1+ as)” 
/ 
Goulet = —l. 


Since this metric was derived by a change of coordinates from a flat- 
space metric, and the Ricci curvature is an intrinsic property, we 
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a/A spaceship (curved world- 
line) moves with an acceleration 
perceived as constant by its 
passengers. The photon (straight 
world-line) comes closer and 
closer to the ship, but will never 
quite catch up. 


expect that this one also has zero Ricci curvature. This is straight- 
forward to verify. The nonvanishing Christoffel symbols are 


Toy = and Ty. =a(1 tar’). 


a 
1+ az’ 
The only elements of the Riemann tensor that look like they might 


/ / . . 
be nonzero are R¢ very and R* 4,4, but both of these in fact vanish. 


Self-check: Verify these facts. 


This seemingly routine exercise now leads us into some very in- 
teresting territory. Way back on page 12, we conjectured that not all 
events could be time-ordered: that is, that there might exists events 
in spacetime 1 and 2 such that 1 cannot cause 2, but neither can 2 
cause 1. We now have enough mathematical tools at our disposal 
to see that this is indeed the case. 


We observe that x(t) approaches the asymptote « = t — 1/a. 
This asymptote has a slope of 1, so it can be interpreted as the 
world-line of a photon that chases the ship but never quite catches 
up to it. Any event to the left of this line can never have a causal 
relationship with any event on the ship’s world-line. Spacetime, as 
seen by an observer on the ship, has been divided by a curtain into 
two causally disconnected parts. This boundary is called an event 
horizon. Its existence is relative to the world-line of a particular 
observer. An observer who is not accelerating along with the ship 
does not consider an event horizon to exist. Although this particular 
example of the indefinitely accelerating spaceship has some physi- 
cally implausible features (e.g., the ship would have to run out of 
fuel someday), event horizons are real things. In particular, we will 
see in section 6.3.2 that black holes have event horizons. 


Interpreting everything in the (t’, 2’) coordinates tied to the ship, 
the metric’s component gj,,, vanishes at z/ = —1/a. An observer 
aboard the ship reasons as follows. If I start out with a head-start 
of 1/a relative to some event, then the timelike part of the metric at 
that event vanishes. If the event marks the emission of a material 
particle, then there is no possible way for that particle’s world-line 
to have ds? > 0. If I were to detect a particle emitted at that event, 
it would violate the laws of physics, since material particles must 
have ds? > 0, so I conclude that I will never observe such a particle. 
Since all of this applies to any material particle, regardless of its 
mass m, it must also apply in the limit m — 0, i.e., to photons and 
other massless particles. Therefore I can never receive a particle 
emitted from this event, and in fact it appears that there is no way 
for that event, or any other event behind the event horizon, to have 
any effect on me. In my frame of reference, it appears that light 
cones near the horizon are tipped over so far that their future light- 
cones lie entirely in the direction away from me. 


We’ve already seen in example 14 on page 64 that a naive New- 
tonian argument suggests the existence of black holes; if a body is 
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sufficiently compact, light cannot escape from it. In a relativistic 
treatment, this should be described as an event horizon. 


6.1.2 Information paradox 


The existence of event horizons in general relativity has deep 
implications, and in particular it helps to explain why it is so diffi- 
cult to reconcile general relativity with quantum mechanics, despite 
nearly a century of valiant attempts. Quantum mechanics has a 
property called unitarity. Mathematically, this says that if the state 
of a quantum mechanical system is given, at a certain time, in the 
form of a vector, then its state at some point in the future can be 
predicted by applying a unitary matrix to that vector. A unitary 
matrix is the generalization to complex numbers of the ordinary 
concept of an orthogonal matrix, and essentially it just represents a 
change of basis, in which the basis vectors have unit length and are 
perpendicular to one another. 


To see what this means physically, consider the following nonex- 
amples. The matrix 
1 0 
(0 0) 


is not unitary, because its rows and columns are not orthogonal vec- 
tors with unit lengths. If this matrix represented the time-evolution 
of a quantum mechanical system, then its meaning would be that 
any particle in state number 1 would be left alone, but any particle 
in state 2 would disappear. Any information carried by particles in 
state 2 is lost forever and can never be retrieved. This also violates 
the time-reversal symmetry of quantum mechanics. 


Another nonunitary matrix is: 


1 0 
(0 ve) 

Here, any particle in state 2 is increased in amplitude by a factor of 
/2, meaning that it is doubled in probability. That is, the particle 
is cloned. This is the opposite problem compared to the one posed 
by the first matrix, and it is equally problematic in terms of time- 
reversal symmetry and conservation of information. Actually, if we 
could clone a particle in this way, it would violate the Heisenberg 
uncertainty principle. We could make two copies of the particle, 
and then measure the position of one copy and the momentum of 
the other, each with unlimited precision. This would violate the 
uncertainty principle, so we believe that it cannot be done. This is 
known as the no-cloning theorem.! 


The existence of event horizons in general relativity violates uni- 
tarity, because it allows information to be destroyed. If a particle is 
thrown behind an event horizon, it can never be retrieved. 


'Ahnet al. have shown that the no-cloning theorem is violated in the presence 
of closed timelike curves: arxiv.org/abs/1008.0221v1 
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be 
b / Bill Unruh (1945-). 


6.1.3 Radiation from event horizons 


In interesting twist on the situation was introduced by Bill Un- 
ruh in 1976. Observer B aboard the accelerating spaceship believes 
in the equivalence principle, so she knows that the local proper- 
ties of space at the event horizon would seem entirely normal and 
Lorentzian to a local observer A. (The same applies to a black hole’s 
horizon.) In particular, B knows that A would see pairs of virtual 
particles being spontaneously created and destroyed in the local vac- 
uum. This is simply a manifestation of the time-energy form of the 
uncertainty principle, AE At < h. Now suppose that a pair of par- 
ticles is created, but one is created in front of the horizon and one 
behind it. To A these are virtual particles that will have to be an- 
nihilated within the time At, but according to B the one created in 
front of the horizon will eventually catch up with the spaceship, and 
can be observed there, although it will be red-shifted. The amount 
of redshift is given by \/gf, = \/(1 + a2’). Say the pair is created 
right near the horizon, at x’ = —1/a. By the uncertainty princi- 
ple, each of the two particles is spread out over a region of space 
of size Ax’. Since these are photons, which travel at the speed of 
light, the uncertainty in position is essentially the same as the un- 
certainty in time. The forward-going photon’s redshift comes out 
to be aAz’ = aAt’, which by the uncertainty principle should be at 
least ha/E, so that when the photon is observed by B, its energy is 
E(ha/E) = ha. 


Now B sees a uniform background of photons, with energies of 
around ha, being emitted randomly from the horizon. They are 
being emitted from empty space, so it seems plausible to believe 
that they don’t encode any information at all; they are completely 
random. A surface emitting a completely random (i.e., maximum- 
entropy) hail of photons is a black-body radiator, so we expect that 
the photons will have a black-body spectrum, with its peak at an 
energy of about ha. This peak is related to the temperature of 
the black body by EF ~ kT, where k is Boltzmann’s constant. We 
conclude that the horizon acts like a black-body radiator with a 
temperature T ~ ha/k. The more careful treatment by Unruh shows 
that the exact relation is T = ha/4n?k, or ha/4n7kc in SI units. 


An important observation here is that not only do different ob- 
servers disagree about the number of quanta that are present (which 
is true in the case of ordinary Doppler shifts), but about the number 
of quanta in the vacuum as well. B sees photons that according to 
A do not exist. 


Let’s consider some real-world examples of large accelerations: 
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acceleration temperature of 


(m/s?) horizon (K) 
bullet fired from a gun 10° one? 
electron in a CRT 10° iC ia 
plasmas produced by intense 107! 10 
laser pulses 
proton in a helium nucleus 102% 10° 


To detect Unruh radiation experimentally, we would ideally like to 
be able to accelerate a detector and let it detect the radiation. This 
is clearly impractical. The third line shows that it is possible to 
impart very large linear accelerations to subatomic particles, but 
then one can only hope to infer the effect of the Unruh radiation 
indirectly by its effect on the particles. As shown on the final line, 
examples of extremely large nonlinear accelerations are not hard to 
find, but the interpretation of Unruh radiation for nonlinear motion 
is unclear. A summary of the prospects for direct experimental de- 
tection of this effect is given by Rosu.? This type of experiment is 
clearly extremely difficult, but it is one of the few ways in which one 
could hope to get direct empirical insight, under controlled condi- 
tions, into the interface between gravity and quantum mechanics. 


6.2 The Schwarzschild metric 


We now set ourselves the goal of finding the metric describing the 
static spacetime outside a spherically symmetric, nonrotating, body 
of mass m. This problem was first solved by Karl Schwarzschild 
in 1915.° One byproduct of finding this metric will be the ability 
to calculate the geodetic effect exactly, but it will have more far- 
reaching consequences, including the existence of black holes. 


The problem we are solving is similar to calculating the spher- 
ically symmetric solution to Gauss’s law in a vacuum. The solu- 
tion to the electrical problem is of the form f/r?, with an arbitrary 
constant of proportionality that turns out to be proportional to 
the charge creating the field. One big difference, however, is that 
whereas Gauss’s law is linear, the equation Ra, = 0 is highly non- 
linear, so that the solution cannot simply be scaled up and down in 
proportion to m. 


The reason for this nonlinearity is fundamental to general rela- 
tivity. For example, when the earth condensed out of the primordial 
solar nebula, a large amount of heat was produced, and this energy 
was then gradually radiated into outer space, decreasing the total 
mass of the earth. If we pretend, as in figure a, that this process 


http: //xxx. lanl. gov/abs/gr-qc/9605032 

3“On the gravitational field of a point mass according to Einstein’s the- 
ory,” Sitzungsberichte der K oniglich Preussischen Akademie der Wissenschaften 
1 (1916) 189. An English translation is available at http://arxiv.org/abs/ 
physics/9905030v1. 


a/The field equations of general 


relativity are nonlinear. 
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involved the merging of only two bodies, each with mass m, then 
the net result was essentially to take separated masses m and m™ at 
rest, and bring them close together to form close-neighbor masses 
m and m, again at rest. The amount of energy radiated away was 
proportional to m?, so the inertial mass of the combined system has 
been reduced from 2m to 2m+6, where 6 ~ —G/c?r. The reduction 
in inertial mass due to radiation in this scenario is in fact almost 
exactly identical to the result of the thought experiment used by 
Einstein in his original paper on E = mc? (reproduced on p. ??). 
Based on the equivalence principle, we expect that this reduction 
in inertial mass must be accompanied by an equal reduction in the 
gravitational mass. We therefore find that there is a nonlinear de- 
pendence of the gravitational field on the masses. 


Self-check: The signature of a metric is defined as the list of 
positive and negative signs that occur when it is diagonalized.* The 
equivalence principle requires that the signature be + — —— (or 
— +++, depending on the choice of sign conventions). Verify that 
any constant metric (including a metric with the “wrong” signature, 
e.g., 2+2 dimensions rather than 3+1) is a solution to the Einstein 
field equation in vacuum. 


The correspondence principle tells us that our result must have 
a Newtonian limit, but the only variables involved are m and r, so 
this limit must be the one in which r/m is large. Large compared 
to what? There is nothing else available with which to compare, 
so it can only be large compared to some expression composed of 
the unitless constants G and c. We have already chosen units such 
that c = 1, and we will now set G = 1 as well. Mass and distance 
are now comparable, with the conversion factor being G/c? = 7 x 
10-78 m/kg, or about a mile per solar mass. Since the earth’s radius 
is thousands of times more than a mile, and its mass hundreds of 
thousands of times less than the sun’s, its r/m is very large, and 
the Newtonian approximation is good enough for all but the most 
precise applications, such as the GPS network or the Gravity Probe 
B experiment. 


6.2.1 The zero-mass case 


First let’s demonstrate the trivial solution with flat spacetime. 
In spherical coordinates, we have 


ds? = dé? — dr? — r7. dé? — r’ sin? 6 dd”. 


‘See p. 254 for a different but closely related use of the same term. 
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The nonvanishing Christoffel symbols (ignoring swaps of the lower 
indices) are: 


1 
0 
=> 
1 
Os ie 
ae aa 
eS 


P33 = —rsin? 0 
ue = —sin@cos0@ 
tae = cot 0 


Self-check: If we’d been using the (— + ++) metric instead of 
(+ ———), what would have been the effect on the Christoffel sym- 
bols? What if we’d expressed the metric in different units, rescaling 
all the coordinates by a factor k? 


Use of ctensor 


In fact, when I calculated the Christoffel symbols above by hand, 
I got one of them wrong, and missed calculating one other because I 
thought it was zero. I only found my mistake by comparing against 
a result in a textbook. The computation of the Riemann tensor is 
an even bigger mess. It’s clearly a good idea to resort to a com- 
puter algebra system here. Cadabra, which was discussed earlier, is 
specifically designed for coordinate-independent calculations, so it 
won’t help us here. A good free and open-source choice is ctensor, 
which is one of the standard packages distributed along with the 
computer algebra system Maxima, introduced on page 75. 


The following Maxima program calculates the Christoffel sym- 
bols found in section 6.2.1. 


load(ctensor) ; 
ct_coords: [t,r,theta, phi] ; 
lg:matrix([1,0,0,0], 
[0,-1,0,0], 
[0,0,-r72,0], 
[0,0,0,-r°2*sin (theta) ~2]); 
cmetric(); 
christof (mcs) ; 


CONonow»1rwnd Fe 


Line 1 loads the ctensor package. Line 2 sets up the names of the 
coordinates. Line 3 defines the gay, with 1g meaning “the version of 
g with lower indices.” Line 7 tells Maxima to do some setup work 
with gap, including the calculation of the inverse matrix g®, which 
is stored in ug. Line 8 says to calculate the Christoffel symbols. 
The notation mcs refers to the tensor I”,,.” with the indices swapped 
around a, little compared to the convention I“, followed in this 
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book. On a Linux system, we put the program in a file flat.mac 
and run it using the command maxima -b flat.mac. The relevant 
part of the output is: 


1 
(46) mcs =- 
2, 3, 3 r 
1 
CUt7) mcs = - 
2, 4, 4 r 
(%t8) mcs =-fr 
3, 3, 2 
cos (theta) 
C49) mcs = ---------- 
3, 4, 4  sin(theta) 
2 
(%t10) mcs = - r sin (theta) 
4, 4, 2 
(%t11) mcs = - cos(theta) sin(theta) 
4, 4, 3 


Adding the command ricci(true); at the end of the program re- 
sults in the output THIS SPACETIME IS EMPTY AND/OR FLAT, which 
saves us hours of tedious computation. The tensor ric (which here 
happens to be zero) is computed, and all its nonzero elements are 
printed out. There is a similar command riemann(true) ; to com- 
pute the Riemann rensor riem. This is stored so that riem[i,j,k,1] 
is what we would call Res 5° Note that / is moved to the end, and 7 
and k are also swapped. 


6.2.2 Geometrized units 


If the mass creating the gravitational field isn’t zero, then we 
need to decide what units to measure it in. It has already proved 
very convenient to adopt units with c = 1, and we will now also set 
the gravitational constant G = 1. Previously, with only c set to 1, 
the units of time and length were the same, [T] = [L], and so were 
the units of mass and energy, [M] = [E]. With G = 1, all of these 
become the same units, [T] = [ZL] = [M] = [EF]. 


Self-check: Verify this statement by combining Newton’s law of 
gravity with Newton’s second law of motion. 


The resulting system is referred to as geometrized, because units 
like mass that had formerly belonged to the province of mechanics 
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are now measured using the same units we would use to do geometry. 


6.2.3 A large-r limit 


Now let’s think about how to tackle the real problem of finding 
the non-flat metric. Although general relativity lets us pick any co- 
ordinates we like, the spherical symmetry of the problem suggests 
using coordinates that exploit that symmetry. The flat-space coor- 
dinates 0 and ¢ can stil be defined in the same way, and they have 
the same interpretation. For example, if we drop a test particle 
toward the mass from some point in space, its world-line will have 
constant 6 and ¢. The r coordinate is a little different. In curved 
spacetime, the circumference of a circle is not equal to 27 times the 
distance from the center to the circle; in fact, the discrepancy be- 
tween these two is essentially the definition of the Ricci curvature. 
This gives us a choice of two logical ways to define r. We’ll de- 
fine it as the circumference divided by 27, which has the advantage 
that the last two terms of the metric are the same as in flat space: 
—r? dé? —r? sin? 6d¢?. Since we’re looking for static solutions, none 
of the elements of the metric can depend on t. Also, the solution is 
going to be symmetric under t + —t, 0 ~ —0, and ¢ > —@, so we 
can’t have any off-diagonal elements.? The result is that we have 
narrowed the metric down to something of the form 


ds? = h(r) dt? — k(r) dr? — r? dé? — r? sin? 6.d¢?, 


where both h and k approach 1 for r — oo, where spacetime is flat. 


For guidance in how to construct h and k, let’s consider the 
acceleration of a test particle at r >> m, which we know to be 
—m/r?, since nonrelativistic physics applies there. We have 


Viv’ = Ov" + aa 


An observer free-falling along with the particle observes its acceler- 
ation to be zero, and a tensor that is zero in one coordinate system 
is zero in all others. Since the covariant derivative is a tensor, we 
conclude that Viv" = 0 in all coordinate systems, including the 
(t,r,...) system we’re using. If the particle is released from rest, 
then initially its velocity four-vector is (1,0,0,0), so we find that its 
acceleration in (t,r) coordinates is -I",, = —5g9"’Orgiu = —5h'/k. 
Setting this equal to —m/r?, we find h'/k = 2m/r? for r >> m. 
Since k = 1 for large r, we have 


2m 
hy for r >> ™m. 


The interpretation of this calculation is as follows. We assert the 
equivalence principle, by which the acceleration of a free-falling par- 
ticle can be said to be zero. After some calculations, we find that 


°For more about time-reversal symmetry, see p. 223. 
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the rate at which time flows (encoded in h) is not constant. It is 
different for observers at different heights in a gravitational poten- 
tial well. But this is something we had already deduced, without 
the index gymnastics, in example 7 on page 129. 


Integrating, we find that for large r, h = 1 — 2m/r. 


6.2.4 The complete solution 
A series solution 


We’ve learned some interesting things, but we still have an ex- 
tremely nasty nonlinear differential equation to solve. One way to 
attack a differential equation, when you have no idea how to pro- 
ceed, is to try a series solution. We have a small parameter m/r to 
expand around, so let’s try to write h and k as series of the form 


p= eran (2) 
baa (") 


We already know ao, a;, and bo. Let’s try to find b;. In the 
following Maxima code I omit the factor of m in h; for convenience. 
In other words, we’re looking for the solution for m = 1. 


load(ctensor) ; 

ct_coords: [t,r,theta, phi] ; 

lg:matrix([(1-2/r) ,0,0,0], 
[0,-(1+b1i/r) ,0,0], 
[0;,0,=r°2,01., 
[0,0,0,-r°2*sin(theta)~2]); 

cmetric(); 


OANo»rwne 


ricci(true) ; 


I won’t reproduce the entire output of the Ricci tensor, which 
is voluminous. We want all four of its nonvanishing components to 
vanish as quickly as possible for large values of r, so I decided to 
fiddle with Ri, which looked as simple as any of them. It appears 
to vary as r~“ for large r, so let’s evaluate lim,—o. (r4 Rit): 


9 limit (r*4*ric[1,1],r,inf); 


The result is (b} —2)/2, so let’s set b} = 2. The approximate solution 
we’ve found so far (reinserting the m’s), 


den (1 a) dt? (1 a =m) dr? — 2d? — r? sin? 6 dé?, 
iT r 


was first derived by Einstein in 1915, and he used it to solve the 
problem of the non-Keplerian relativistic correction to the orbit of 
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Mercury, which was one of the first empirical tests of general rela- 
tivity. 
Continuing in this fashion, the results are as follows: 


ag=1 bo = 1 
a, = —2 b; = 2 
ag = 0 bo =4 
az = 0 bo =8 


The closed-form solution 


The solution is unexpectedly simple, and can be put into closed 
form. The approximate result we found for h was in fact exact. 
For k we have a geometric series 1/(1 — 2/r), and when we reinsert 
the factor of m in the only way that makes the units work, we get 
1/(1—2m/r). The result for the metric is 


as? = (1 mY ae (; a) a? r? d0? — r? sin? 6.dd?. 
r — Z2zm/r 


This is called the Schwarzschild metric. A quick calculation in Max- 
ima demonstrates that it is an exact solution for all r, i.e., the Ricci 
tensor vanishes everywhere, even at r < 2m, which is outside the 
radius of convergence of the geometric series. 


Time-reversal symmetry 


The Schwarzschild metric is invariant under time reversal, since 
time occurs only in the form of dt?, which stays the same under 
dt — —dt. This is the same time-reversal symmetry that occurs in 
Newtonian gravity, where the field is described by the gravitational 
acceleration g, and accelerations are time-reversal invariant. 


Fundamentally, this is an example of general relativity’s coordi- 
nate independence. The laws of physics provided by general rela- 
tivity, such as the vacuum field equation, are invariant under any 
smooth coordinate transformation, and t + —t is such a coordinate 
transformation, so general relativity has time-reversal symmetry. 
Since the Schwarzschild metric was found by imposing time-reversal- 
symmetric boundary conditions on a time-reversal-symmetric differ- 
ential equation, it is an equally valid solution when we time-reverse 
it. Furthermore, we expect the metric to be invariant under time 
reversal, unless spontaneous symmetry breaking occurs (see p. 348). 


This suggests that we ask the more fundamental question of what 
global symmetries general relativity has. Does it have symmetry 
under parity inversion, for example? Or can we take any solution 
such as the Schwarzschild spacetime and transform it into a frame 
of reference in which the source of the field is moving uniformly in a 
certain direction? Because general relativity is locally equivalent to 
special relativity, we know that these symmetries are locally valid. 
But it may not even be possible to define the corresponding global 
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symmetries. For example, there are some spacetimes on which it 
is not even possible to define a global time coordinate. On such a 
spacetime, which is described as not time-orientable, there does not 
exist any smooth vector field that is everywhere timelike, so it is 
not possible to define past versus future light-cones at all points in 
space without having a discontinuous change in the definition occur 
somewhere. This is similar to the way in which a Mobius strip does 
not allow an orientation of its surface (an “up” direction as seen by 
an ant) to be defined globally. 


Suppose that our spacetime is time-orientable, and we are able 
to define coordinates (p,q,r,s) such that p is always the timelike 
coordinate. Because g — —q is a smooth coordinate transforma- 
tion, we are guaranteed that our spacetime remains a valid solution 
of the field equations under this change. But that doesn’t mean 
that what we’ve found is a symmetry under parity inversion in a 
plane. Our coordinate gq is not necessarily interpretable as distance 
along a particular “g axis.” Such axes don’t even exist globally in 
general relativity. A coordinate does not even have to have units 
of time or distance; it could be an angle, for example, or it might 
not have any geometrical significance at all. Similarly, we could do 
a transformation gq > q/ = q+kp. If we think of q as measuring 
spatial position and p time, then this looks like a Galilean transfor- 
mation, with k being the velocity. The solution to the field equations 
obtained after performing this transformation is still a valid solu- 
tion, but that doesn’t mean that relativity has Galilean symmetry 
rather than Lorentz symmetry. There is no sensible way to define 
a Galilean transformation acting on an entire spacetime, because 
when we talk about a Galilean transformation we assume the exis- 
tence of things like global coordinate axes, which do not even exist 
in general relativity. 


6.2.5 Geodetic effect 


As promised in section 5.5.1, we now calculate the geodetic effect 
on Gravity Probe B, including all the niggling factors of 3 and 7. To 
make the physics clear, we approach the actual calculation through 
a series of warmups. 


Flat space 


As a first warmup, consider two spatial dimensions, represented 
by Euclidean polar coordinates (r,@). Parallel-transport of a gyro- 
scope’s angular momentum around a circle of constant r gives 


Vol? =0 
VoL" =0. 
Computing the covariant derivatives, we have 
0 = OjL°+T%, 2" 
OS O,E FT gy l. 
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The Christoffel symbols are ie = 1/r and I",, = —r. This is 


all made to look needlessly complicated because L® and L" are ex- 
pressed in different units. Essentially the vector is staying the same, 
but we’re expressing it in terms of basis vectors in the r and ¢ di- 
rections that are rotating. To see this more transparently, let r = 1, 
and write P for L® and Q for L’, so that 


P’=-Q 
Q =P, 


which have solutions such as P = sing, Q = cos@. For each orbit 
(27 change in ¢), the basis vectors rotate by 27, so the angular 
momentum vector once again has the same components. In other 
words, it hasn’t really changed at all. 


Spatial curvature only 


The flat-space calculation above differs in two ways from the 
actual result for an orbiting gyroscope: (1) it uses a flat spatial 
geometry, and (2) it is purely spatial. The purely spatial nature of 
the calculation is manifested in the fact that there is nothing in the 
result relating to how quickly we’ve moved the vector around the 
circle. We know that if we whip a gyroscope around in a circle on 
the end of a rope, there will be a Thomas precession (section 2.5.4), 
which depends on the speed. 


As our next warmup, let’s curve the spatial geometry, but con- 
tinue to omit the time dimension. Using the Schwarzschild metric, 
we replace the flat-space Christoffel symbol I", = —r with —r+2m. 
The differential equations for the components of the L vector, again 
evaluated at r = 1 for convenience, are now 


P'=-Q 
Q'=(1—€)P, 


where € = 2m. The solutions rotate with frequency w’ = /1—e. 
The result is that when the basis vectors rotate by 27, the compo- 
nents no longer return to their original values; they lag by a factor 
of /l1—e=1-—m. Putting the factors of r back in, this is 1— m/r. 
The deviation from unity shows that after one full revolution, the D 
vector no longer has quite the same components expressed in terms 
of the (r, @) basis vectors. 


To understand the sign of the effect, let’s imagine a counter- 
clockwise rotation. The (r,@) rotate counterclockwise, so relative 
to them, the L vector rotates clockwise. After one revolution, it has 
not rotated clockwise by a full 27, so its orientation is now slightly 
counterclockwise compared to what it was. Thus the contribution 
to the geodetic effect arising from spatial curvature is in the same 
direction as the orbit. 
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Comparing with the actual results from Gravity Probe B, we see 
that the direction of the effect is correct. The magnitude, however, 
is off. The precession accumulated over n periods is 2mnm/r, or, 
in SI units, 27nGm/c?r. Using the data from section 2.5.4, we find 
A@ = 2x 107° radians, which is too small compared to the data 
shown in figure b on page 171. 


2+1 dimensions 


To reproduce the experimental results correctly, we need to in- 
clude the time dimension. The angular momentum vector now has 
components (L?, L", L'). The physical interpretation of the L’ com- 
ponent is obscure at this point; we’ll return to this question later. 


Writing down the total derivatives of the three components, and 
notating dt/d@ as w—!, we have 


6 

Fo a Ook? + tL 
L” 

Se = Ook! +o 1BL" 
t 

Se = Ook! + uta! 


Setting the covariant derivatives equal to zero gives 


O= OL9+1%, 7 
_ ¢ 
0=0,0" +1",L 
0=0,0'+T%,L". 


Self-check: There are not just four but six covariant derivatives 
that could in principle have occurred, and in these six covariant 
derivatives we could have had a total of 18 Christoffel symbols. Of 
these 18, only four are nonvanishing. Explain based on symmetry 
arguments why the following Christoffel symbols must vanish: r? ot? 
Ie. 


Putting all this together in matrix form, we have L’ = ML, 
where 


0 —1 0 
M=]| l-e 0 —e(1 — €)/2w 
0 —e/2w(1—€«) 0 


The solutions of this differential equation oscillate like e*’, where 
iQ. is an eigenvalue of the matrix. 


Self-check: The frequency in the purely spatial calculation was 
found by inspection. Verify the result by applying the eigenvalue 
technique to the relevant 2 x 2 submatrix. 
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To lowest order, we can use the Newtonian relation w?r = Gm/r 
and neglect terms of order €?, so that the two new off-diagonal ma- 
trix elements are both approximated as /e/2. The three resulting 
eigenfrequencies are zero and 2. = 4{1 — (3/2)m/r]. 


The presence of the mysterious zero-frequency solution can now 
be understood by recalling the earlier mystery of the physical inter- 
pretation of the angular momentum’s L‘ component. Our results 
come from calculating parallel transport, and parallel transport is 
a purely geometric process, so it gives the same result regardless 
of the physical nature of the four-vector. Suppose that we had 
instead chosen the velocity four-vector as our guinea pig. The def- 
inition of a geodesic is that it parallel-transports its own tangent 
vector, so the velocity vector has to stay constant. If we inspect 
the eigenvector corresponding to the zero-frequency eigenfrequency, 
we find a timelike vector that is parallel to the velocity four-vector. 
In our 2+1-dimensional space, the other two eigenvectors, which 
are spacelike, span the subspace of spacelike vectors, which are the 
ones that can physically be realized as the angular momentum of 
a gyroscope. These two eigenvectors, which vary as e+", can be 
superposed to make real-valued spacelike solutions that match the 
initial conditions, and these lag the rotation of the basis vectors 
by AQ = (3/2)mr. This is greater than the purely spatial result 
by a factor of 3/2. The resulting precession angle, over n orbits of 
Gravity Probe B, is 3rnGm/c?r = 3 x 107° radians, in excellent 
agreement with experiment. 


One will see apparently contradictory statements in the litera- 
ture about whether Thomas precession occurs for a satellite: “The 
Thomas precession comes into play for a gyroscope on the surface 
of the Earth ..., but not for a gyroscope in a freely moving satel- 
lite.”© But: “The total effect, geometrical and Thomas, gives the 
well-known Fokker-de Sitter precession of 317m/r, in the same sense 
as the orbit.”” The second statement arises from subtracting the 
purely spatial result from the 2+1-dimensional result, and noting 
that the absolute value of this difference is the same as the Thomas 
precession that would have been obtained if the gyroscope had been 
whirled at the end of a rope. In my opinion this is an unnatural 
way of looking at the physics, for two reasons. (1) The signs don’t 
match, so one is forced to say that the Thomas precession has a 
different sign depending on whether the rotation is the result of 
gravitational or nongravitational forces. (2) Referring to observa- 
tion, it is clearly artificial to treat the spatial curvature and Thomas 
effects separately, since neither one can be disentangled from the 
other by varying the quantities n, m, and r. For more discussion, 
see tinyurl.com/me3qf8o. 


®Misner, Thorne, and Wheeler, Gravitation, p. 1118 
"Rindler, Essential Relativity, 1969, p. 141 
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6.2.6 Orbits 


The main event of Newton’s Principia Mathematica is his proof 
of Kepler’s laws. Similarly, Einstein’s first important application in 
general relativity, which he began before he even had the exact form 
of the Schwarzschild metric in hand, was to find the non-Newtonian 
behavior of the planet Mercury. The planets deviate from Keplerian 
behavior for a variety of Newtonian reasons, and in particular there 
is a long list of reasons why the major axis of a planet’s elliptical 
orbit is expected to gradually rotate. When all of these were taken 
into account, however, there was a remaining discrepancy of about 
40 seconds of arc per century, or 6.6 x 107’ radians per orbit. The 
direction of the effect was in the forward direction, in the sense that 
if we view Mercury’s orbit from above the ecliptic, so that it orbits 
in the counterclockwise direction, then the gradual rotation of the 
major axis is also counterclockwise. 


As a very rough hand-wavy explanation for this effect, consider 
the spatial part of the curvature of the spacetime surrounding the 
sun. This spatial curvature is positive, so a circle’s circumference 
is less than 27 times its radius. We could imagine that this would 
cause Mercury to get back to a previously visited angular position 
before it has had time to complete its Newtonian cycle of radial 
motion. Arguments such as this one, however, should not be taken 
too seriously. A mathematical analysis is required. 


Based on the examples in section 5.5, we expect that the effect 
will be of order m/r, where m is the mass of the sun and r is the 
radius of Mercury’s orbit. This works out to be 2.5 x 1078, which 
is smaller than the observed precession by a factor of about 26. 


Conserved quantities 


If Einstein had had a computer on his desk, he probably would 
simply have integrated the motion numerically using the geodesic 
equation. But it is possible to simplify the problem enough to at- 
tack it with pencil and paper, if we can find the relevant conserved 
quantities of the motion. Nonrelativistically, these are energy and 
angular momentum. 


Consider a rock falling directly toward the sun. The Schwarzschild 
metric is of the special form 


ds? = h(r) dt? — k(r) dr? —.... 


The rock’s trajectory is a geodesic, so it extremizes the proper time 
s between any two events fixed in spacetime, just as a piece of string 
stretched across a curved surface extremizes its length. Let the rock 
pass through distance r; in coordinate time t,, and then through r2 
in tg. (These should really be notated as Ar, ...or dri, ..., but we 
avoid the A’s or d's for convenience.) Approximating the geodesic 
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using two line segments, the proper time is 
&§= 581+ $2 
= nt? = kyr? + hat _ kgr3 


= int? -_ kur? + VhalL — t1)? _ kore, 


where T' = t, + tg is fixed. If this is to be extremized with respect 
to t1, then ds/ dt; = 0, which leads to 


je hyty hate 
84 82 


’ 


which means that 
dt dx = dz; 


ds at ds ds 

is a constant of the motion. Except for an irrelevant factor of m, 
this is the same as 7;, the timelike component of the covariant mo- 
mentum vector. We’ve already seen that in special relativity, the 
timelike component of the momentum four-vector is interpreted as 
the mass-energy E, and the quantity p; has a similar interpretation 
here. Note that no special assumption was made about the form of 
the functions h and k. In addition, it turns out that the assumption 
of purely radial motion was unnecessary. All that really mattered 
was that h and k were independent of t. Therefore we will have 
a similar conserved quantity p, any time the metric’s components, 
expressed in a particular coordinate system, are independent of x". 
(This is generalized on p. 266.) In particular, the Schwarzschild 
metric’s components are independent of ¢ as well as t, so we have a 
second conserved quantity pg, which is interpreted as angular mo- 
mentum. 


h 


Writing these two quantities out explicitly in terms of the con- 
travariant coordinates, in the case of the Schwarzschild spacetime, 


we have 
2m\ dt 
E=({(1-—-—])— 
( r )E 
and 
do 
L=r— 
if ds 


for the conserved energy per unit mass and angular momentum per 
unit mass. 


In interpreting the energy per unit mass FE, it is important to 
understand that in the general-relativistic context, there is no use- 
ful way of separating the rest mass, kinetic energy, and potential 
energy into separate terms, as we could in Newtonian mechanics. 
F includes contributions from all of these, and turns out to be less 


b/Proof that if the metric’s 
components are independent of 
t, the geodesic of a test particle 
conserves /}. 
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than the contribution due to the rest mass (i.e., less than 1) for a 
planet orbiting the sun. It turns out that E can be interpreted as a 
measure of the additional gravitational mass that the solar system 
possesses as measured by a distant observer, due to the presence of 
the planet. It then makes sense that FE is conserved; by analogy 
with Newtonian mechanics, we would expect that any gravitational 
effects that depended on the detailed arrangement of the masses 
within the solar system would decrease as 1/r+, becoming negligible 
at large distances and leaving a constant field varying as 1/r?. 


One way of seeing that it doesn’t make sense to split FE into parts 
is that although the equation given above for E involves a specific set 
of coordinates, E can actually be expressed as a Lorentz-invariant 
scalar (see p. 266). This property makes EF especially interesting and 
useful (and different from the energy in Newtonian mechanics, which 
is conserved but not frame-independent). On the other hand, the 
kinetic and potential energies depend on the velocity and position. 
These are completely dependent on the coordinate system, and there 
is nothing physically special about the coordinate system we’ve used 
here. Suppose a particle is falling directly toward the earth, and an 
astronaut in a space-suit is free-falling along with it and monitoring 
its progress. The astronaut judges the particle’s kinetic energy to 
be zero, but other observers say it’s nonzero, so it’s clearly not a 
Lorentz scalar. And suppose the astronaut insists on defining a 
potential energy to go along with this kinetic energy. The potential 
energy must be decreasing, since the particle is getting closer to the 
earth, but then there is no way that the sum of the kinetic and 
potential energies could be constant. 


Perihelion advance 


For convenience, let the mass of the orbiting rock be 1, while m 
stands for the mass of the gravitating body. 


The unit mass of the rock is a third conserved quantity, and 
since the magnitude of the momentum vector equals the square of 
the mass, we have for an orbit in the plane 0 = 7/2, 

1 = gp} — gp? — 9°? p4 


= gp? — Grr(p")? — g°?Ds 


2 
ae 1 RB 1 dr 1 L. 
1—2m/r 1—2m/r \ds ‘ike 


Rearranging terms and writing 7 for dr/ds, this becomes 


fe? = E* —(1-—2m/r)(1+ [7 /r?) 


(2 — By? 
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where 
U? = (1—2m/r)(1 + L?/r?). 


There is a varied and strange family of orbits in the Schwarzschild 
field, including bizarre knife-edge trajectories that take several nearly 
circular turns before suddenly flying off. We turn our attention in- 
stead to the case of an orbit such as Mercury’s which is nearly 
Newtonian and nearly circular. 


Nonrelativistically, a circular orbit has radius r = L?/m and 
period T = 2rL3/m?. 


Relativistically, a circular orbit occurs when there is only one 
turning point at which * = 0. This requires that E? equal the 
minimum value of U2, which occurs at 


L? 
r= (1 +/1- 12m?/T?) 
2m 
L? 
= —(1- 
—(1-€), 


where € = 3(m/L)?. A planet in a nearly circular orbit oscillates 
between perihelion and aphelion with a period that depends on the 
curvature of U? at its minimum. We have 


_ &(U?) 

dr? 

ave 2m L? mL? 
dr? ( r o r2 r3 ) 

_— 4m , 6L? 24m? 

pb rd ro 


The period of the oscillations is 


ASese = 2n4/ 2/k 


= InL>m-7(1 —€). 
The period of the azimuthal motion is 


Asa: = Qnr?/L 
= Onl ny A =e): 


The periods are slightly mismatched because of the relativistic cor- 
rection terms. The period of the radial oscillations is longer, so 
that, as expected, the perihelion shift is in the forward direction. 
The mismatch is «As, and because of it each orbit rotates the ma- 
jor axis by an angle 27e = 62(m/L)* = 6xm/r. Plugging in the 
data for Mercury, we obtain 5.8 x 1077 radians per orbit, which 
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agrees with the observed value to within about 10%. Eliminating 
some of the approximations we’ve made brings the results in agree- 
ment to within the experimental error bars, and Einstein recalled 
that when the calculation came out right, “for a few days, I was 
beside myself with joyous excitement.” 


Further attempts were made to improve on the precision of this 
historically crucial test of general relativity. Radar now gives the 
most precise orbital data for Mercury. At the level of about one part 
per thousand, however, an effect creeps in due to the oblateness of 
the sun, which is difficult to measure precisely. 


In 1974, astronomers J.H. Taylor and R.A. Hulse of Princeton, 
working at the Arecibo radio telescope, discovered a binary star 
system whose members are both neutron stars. The detection of 
the system was made possible because one of the neutron stars is 
a pulsar: a neutron star that emits a strong radio pulse in the 
direction of the earth once per rotational period. The orbit is highly 
elliptical, and the minimum separation between the two stars is very 
small, about the same as the radius of our sun. Both because the 
r is small and because the period is short (about 8 hours), the rate 
of perihelion advance per unit time is very large, about 4.2 degrees 
per year. The system has been compared in great detail with the 
predictions of general relativity,® giving extremely good agreement, 
and as a result astronomers have been confident enough to reason in 
the opposite direction and infer properties of the system, such as its 
total mass, from the general-relativistic analysis. The system’s orbit 
is decaying due to the radiation of energy in the form of gravitational 
waves, which are predicted to exist by relativity. 


6.2.7 Doppler shifts and time dilation 


The existence of gravitational Doppler shifts and time dilation 
are a direct consequence of the equivalence principle, as is the quan- 
titative result for a uniform field. Therefore, observations such as 
the Pound-Rebka experiment are not specifically tests of general rel- 
ativity. For this we need high-precision tests in strong fields. Such 
a test was achieved in 2018 with the observation of gravitational 
Doppler shifts from the star $2, which orbits the black hole Sagit- 
tarius A*.9 We can conceptualize such an experiment in terms of the 
emission of two successive rays of light from radius r, the rays be- 
ing observed by an observer at infinity. These rays are our abstract 
mathematical model. In reality, they could, for example, represent 
the motion of two successive radio beeps, two photons, or two suc- 
cessive wavefronts of an electromagnetic wave. Both the emitter and 
the observer are at rest relative to the gravitating body. We wish to 
find the factor by which the time interval is increased at observation 
compared to emission. 


Snttp: //arxiv.org/abs/astro-ph/0407149 
°arxiv.org/abs/1807.09409 
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Because the Schwarzschild metric, expressed in Schwarzschild 
coordinates, is independent of the Schwarzschild time t, it follows 
that any trajectory of a test particle r(t) remains valid if shifted 
to r(t+ 6). Therefore the Schwarzschild time interval At between 
emission of the two rays is the same as the interval At’ at absorp- 
tion. The corresponding proper times for the stationary emitter and 


observer are then in proportion to ,/g4 = \/1—2m/r. A Doppler 
shift of this size was observed in the 2018 work. 


This result misbehaves for r < 2m. In such cases we would be 
discussing a black hole, and the breakdown in the analysis can be 
traced back to the fact that we assumed the emitter to be at rest. 
For r < 2m, we will see that this becomes impossible. It is possible 
in principle to observe Doppler shifts of rays that cross from emission 
at r > 2m to observation at r < 2m, but the observer cannot be at 
rest. This situation is analyzed in example 8, p. 267. 


6.2.8 Deflection of light 


As discussed on page 171, one of the first tests of general rel- 
ativity was Eddington’s measurement of the deflection of rays of 
light by the sun’s gravitational field. The deflection measured by 
Eddington was 1.6 seconds of arc. For a light ray that grazes the 
sun’s surface, the only physically relevant parameters are the sun’s 
mass m and radius r. Since the deflection is unitless, it can only 
depend on m/r, the unitless ratio of the sun’s mass to its radius. 
Expressed in SI units, this is Gm/c?r, which comes out to be about 
10~°. Roughly speaking, then, we expect the order of magnitude of 
the effect to be about this big, and indeed 10~® radians comes out 
to be in the same ball-park as a second of arc. We get a similar 
estimate in Newtonian physics by treating a photon as a (massive) 
particle moving at speed c. 


We will go ahead and actually calculate this deflection below, 
but before diving into the details of this calculation, let us make 
some more general remarks about this classic test of general rela- 
tivity. The precision of Eddington’s original test was only about + 
30%. This was not improved upon for a long time, but the Hip- 
parcos satellite has refined the limit to a fraction of a percent. A 
better technique is radio astronomy, which allows measurements to 
be carried out without waiting for an eclipse. One merely has to 
wait for the sun to pass in front of a strong, compact radio source 
such as a quasar. These techniques have now verified the deflec- 
tion of light predicted by general relativity to a relative precision of 
about 107°.1° 


General relativity also makes an ironclad prediction that this 
deflection is independent of the wavelength of the light. The effect 


For a review article on this topic, see Clifford Will, “The Confrontation 
between General Relativity and Experiment,” arxiv.org/abs/1403.7377. 
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is predicted to be purely geometrical: the light rays follow lightlike 
geodesics. If, on the other hand, the deflection had been due to some 
optical effect, such as refraction in the sun’s corona, we would expect 
the deflection to be very strongly dependent on wavelength. The 
agreement with general relativity for a great range of wavelengths 
from radio to visible light (about four orders of magnitude) makes 
such an explanation extremely implausible. General relativity has 
passed tests involving the deflection of radio waves by Jupiter and 
by gravitational lensing on galactic scales. We could imagine that 
other theories might also reproduce these results, but the predictions 
of some theories that were seriously considered turn out to be off by 
factors of 2. 


We now turn to the detailed calculation of the effect. It is pos- 
sible to calculate a precise value for the deflection using aalytic ap- 
proximations very much like those used to determine the perihelion 
advance in section 6.2.6. However, some of the details would have 
to be changed. For example, it is no longer possible to parametrize 
the trajectory using the proper time s, since a light ray has ds = 0; 
we must use an affine parameter. Let us instead use this an an 
example of the numerical technique for solving the geodesic equa- 
tion, first demonstrated in section 5.9.2 on page 189. Modifying our 
earlier program, we have the following: 


1 import math 

2 

3 # constants, in SI units: 

4 G = 6.67e-11 # gravitational constant 

5 c¢ = 3.00e8 # speed of light 

6 m_kg = 1.99e30 # mass of sun 

7 Yim = 6.96e8 # radius of sun 

8 

9 # From now on, all calculations are in units of the 
10 # radius of the sun. 

11 

12 # mass of sun, in units of the radius of the sun: 
13. m_sun = (G/c**2)*(m_kg/r_m) 

14 =m = 1000.*m_sun 

15 print "m/r=",m 

16 

17 =# Start at point of closest approach. 

18 # initial position: 

19 ~=t=0 

20 r=1 # closest approach, grazing the sun’s surface 
21. + phi=-math.pi/2 

22 # initial derivatives of coordinates w.r.t. lambda 
23 vr = 0 

24 vt=1 

25. vphi = math.sqrt((1.-2.*m/r)/r**2)*vt # gives ds=0, lightlike 
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26 
27 
28 
29 
30 
31 
32 
33 
34 
35 
36 
37 
38 
39 
40 
Al 
42 
43 
44 
45 
46 
A7 
48 
49 
50 
dl 
52 
53 
54 
59 
56 
57 
58 
59 


1=0 # affine parameter lambda 

l_max = 20000. 

epsilon = le-6 # controls how fast lambda varies 
while 1<l_max: 


dl = epsilon*(1.+r**2) # giant steps when farther out 


1 = 1+dl 

# Christoffel symbols: 
Gttr = m/(r**2-2*m*r) 
Grtt = m/r**2-2Q*m**2/r**3 
Grrr = -m/(r**2-2*m*r) 
Grphiphi = -r+2*m 
Gphirphi = 1/r 


# second derivatives: 

# The factors of 2 are because we have, e.g., G°a_{bc}=G*a_{cb} 
at = -2.*Gttr*vt*vr 

ar = —(Grtt*vt*vt + Grrr*vr*vr + Grphiphi*vphi*vphi) 


aphi = -2.*Gphirphi*vr*vphi 
# update velocity: 

vt = vt + dl*at 

vr = vr + dl*ar 

vphi = vphi + dl*aphi 

# update position: 

r=r + vr*dl 

t = t + vt*dl 

phi = phi + vphix*dl 


# Direction of propagation, approximated in asymptotically flat coords. 
# First, differentiate (x,y)=(r cos phi,r sin phi) to get vx and vy: 


vx = vr*math.cos(phi)-r*math.sin(phi) *vphi 
vy = vr*math.sin(phi)+r*math.cos (phi) *vphi 


prop = math.atan2(vy,vx) # inverse tan of vy/vx, in the proper quadrant 


prop_sec = prop*180.*3600/math. pi 


print "final direction of propagation = %6.2f arc-seconds" % prop_sec 


At line 14, we take the mass to be 1000 times greater than the 
mass of the sun. This helps to make the deflection easier to calcu- 
late accurately without running into problems with rounding errors. 
Lines 17-25 set up the initial conditions to be at the point of closest 
approach, as the photon is grazing the sun. This is easier to set 
up than initial conditions in which the photon approaches from far 
away. Because of this, the deflection angle calculated by the pro- 
gram is cut in half. Combining the factors of 1000 and one half, the 
final result from the program is to be interpreted as 500 times the 
actual deflection angle. 


The result is that the deflection angle is predicted to be 870 
seconds of arc. As a check, we can run the program again with 
m = 0; the result is a deflection of —8 seconds, which is a measure 
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of the accumulated error due to rounding and the finite increment 
used for X. 


Dividing by 500, we find that the predicted deflection angle is 
1.74 seconds, which, expressed in radians, is exactly 4Gm/c?r. The 
unitless factor of 4 is in fact the correct result in the case of small 
deflections, i.e., for m/r < 1. 


Although the numerical technique has the disadvantage that it 
doesn’t let us directly prove a nice formula, it has some advantages 
as well. For one thing, we can use it to investigate cases for which 
the approximation m/r < 1 fails. For m/r = 0.3, the numerical 
techique gives a deflection of 222 degrees, whereas the weak-field 
approximation 4Gm/c?r gives only 69 degrees. What is happening 
here is that we’re getting closer and closer to the event horizon of a 
black hole. Black holes are the topic of section 6.3, but it should be 
intuitively reasonable that something wildly nonlinear has to happen 
as we get close to the point where the light wouldn’t even be able 
to escape. 


6.3 Black holes 
6.3.1 Singularities 


A provocative feature of the Schwarzschild metric is that it has 
elements that blow up at r = 0 and at r = 2m. If this is a description 
of the sun, for example, then these singularities are of no physical 
significance, since we only solved the Einstein field equation for the 
vacuum region outside the sun, whereas r = 2m would lie about 3 
km from the sun’s center. Furthermore, it is possible that one or 
both of these singularities is nothing more than a spot where our 
coordinate system misbehaves. This would be known as a coordinate 
singularity. For example, the metric of ordinary polar coordinates 
in a Euclidean plane has g?? - oo as r > 0. 


One way to test whether a singularity is a coordinate singularity 
is to calculate a scalar measure of curvature, whose value is indepen- 
dent of the coordinate system. We can take the trace of the Ricci 
tensor, R®,, known as the scalar curvature or Ricci scalar, but since 
the Ricci tensor is zero, it’s not surprising that that is zero. A differ- 
ent scalar we can construct is the product R”Rapeq of the Riemann 
tensor with itself. This is known as the Kretschmann invariant. The 
Maxima command lriemann (true) displays the nonvanishing com- 
ponents of Raycq. The component that misbehaves the most severely 
abr = O18 Rie = 20) r?. Because of this, the Kretschmann invari- 
ant blows up like r~® as r > 0. This shows that the singularity at 
r = 0 isa real, physical singularity. 


The singularity at r = 2m, on the other hand, turns out to 
be only a coordinate singularity. To prove this, we have to use 
some technique other than constructing scalar measures of curva- 
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ture. Even if every such scalar we construct is finite at r = 2m, that 
doesn’t prove that every such scalar we could construct is also well 
behaved. We can instead search for some other coordinate system 
in which to express the solution to the field equations, one in which 
no such singularity appears. A partially successful change of coordi- 
nates for the Schwarzschild metric, found by Eddington in 1924, is 
tt! =t—2mln(r —2m) (see problem 8 on page 237). This makes 
the covariant metric finite at r = 2m, although the contravariant 
metric still blows up there. A more complicated change of coor- 
dinates that completely eliminates the singularity at r = 2m was 
found by Eddington and Finkelstein in 1958, establishing that the 
singularity was only a coordinate singularity. Thus, if an observer is 
so unlucky as to fall into a black hole, he will not be subjected to in- 
finite tidal stresses — or infinite anything — at r = 2m. He may not 
notice anything special at all about his local environment. (Or he 
may already be dead because the tidal stresses at r > 2m, although 
finite, were nevertheless great enough to kill him. See problem 15.) 


6.3.2 Event horizon 


Even though r = 2m isn’t a real singularity, interesting things 
do happen there. For r < 2m, the sign of g becomes negative, 
while g,, is positive. In our + — —— signature, this has the fol- 
lowing interpretation. For the world-line of a material particle, ds? 
is supposed to be the square of the particle’s proper time, and it 
must always be positive. If a particle had a constant value of r, for 
r < 2m, it would have ds? < 0, which is impossible. 


The timelike and spacelike characters of the r and t coordinates 
have been swapped, so r acts like a time coordinate. 


Thus for an object compact enough that r = 2m is exterior, 
r = 2m is an event horizon: future light cones tip over so far that 
they do not allow causal relationships to connect with the spacetime 
outside. In relativity, event horizons do not occur only in the context 
of black holes; their properties, and some of the implications for 
black holes, have already been discussed in section 6.1. 


The gravitational time dilation in the Schwarzschild field, rel- 
ative to a clock at infinity, is given by the square root of the gt 
component of the metric. This goes to zero at the event horizon, 
meaning that, for example, a photon emitted from the event horizon 
will be infinitely redshifted when it reaches an observer at infinity. 
This makes sense, because the photon is then undetectable, just as 
it would be if it had been emitted from inside the event horizon. 


6.3.3 Infalling matter 


If matter is falling into a black hole, then due to time dilation 
an observer at infinity “sees” that matter as slowing down more and 
more as it approaches the horizon. This has some counterintuitive 
effects. A radially infalling particle has ? r/dt? > 0 once it falls 
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past a certain point, which could be interpreted as a gravitational 
repulsion. The observer at infinity may also be led to describe the 
black hole as consisting of an empty, spherical shell of matter that 
never quite made it through the horizon. If asked what holds the 
shell up, the observer could say that it is held up by gravitational 
repulsion. 


There is actually nothing wrong with any of this, but one should 
realize that it is only one possible description in one possible coor- 
dinate system. An observer hovering just outside the event horizon 
sees a completely different picture, with matter falling past at ve- 
locities that approach the speed of light as it comes to the event 
horizon. If an atom emits a photon from the event horizon, the 
hovering observer sees it as being infinitely red-shifted, but explains 
the red-shift as a kinematic one rather than a gravitational one. 


We can imagine yet a third observer, one who free-falls along 
with the infalling matter. According to this observer, the gravita- 
tional field is always zero, and it takes only a finite time to pass 
through the event horizon. 


If a black hole has formed from the gravitational collapse of a 
cloud of matter, then some of our observers can say that “right now” 
the matter is located in a spherical shell at the event horizon, while 
others can say that it is concentrated at an infinitely dense singular- 
ity at the center. Since simultaneity isn’t well defined in relativity, 
it’s not surprising that they disagree about what’s happening “right 
now.” Regardless of where they say the matter is, they all agree on 
the spacetime curvature. In fact, Birkhoff’s theorem (p. 281) tells us 
that any spherically symmetric vacuum spacetime is Schwarzschild 
in form, so it doesn’t matter where we say the matter is, as long as 
it’s distributed in a spherically symmetric way and surrounded by 
vacuum. 


A particularly nice way of summarizing and understanding these 
issues is with the use of a Penrose diagram, as discussed in section 
7.3.3. 


6.3.4 Expected formation 


Einstein and Schwarzschild did not believe, however, that any of 
these features of the Schwarzschild metric were more than a mathe- 
matical curiosity, and the term “black hole” was not invented until 
the 1967, by John Wheeler. There is quite a bit of evidence these 
days that our universe does contain objects that have undergone 
complete gravitational collapse, in the sense that their mass M is 
contained within a radius r < M (in geometrized units). These 
objects are probably black holes, although doubts have been raised 
recently as to whether they are in fact other objects such as naked 
singularities. Supposing that black holes do exist, there is also the 


See sec. 6.3.6, p. 248, and, e.g., Joshi et al., arxiv.org/abs/1304. 7331. 
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question of what sizes they come in. 


We might expect naively that since gravity is an attractive force, 
there would be a tendency for any primordial cloud of gas or dust 
to spontaneously collapse into a black hole. But clouds of less than 
about 0.1Mo (0.1 solar masses) form planets, which achieve a per- 
manent equilibrium between gravity and internal pressure. Heavier 
objects initiate nuclear fusion, but those with masses above about 
100Mo are immediately torn apart by their own solar winds. In the 
range from 0.1 to 100Mo, stars form. As discussed in section 4.4.3, 
those with masses greater than about a few Mo are expected to 
form black holes when they die. We therefore expect, on theoretical 
grounds, that the universe should contain black holes with masses 
ranging from a few solar masses to a few tens of solar masses. 


6.3.5 Observational evidence 


A black hole is expected to be a very compact object, with a 
strong gravitational field, that does not emit any of its own light. A 
bare, isolated black hole would be difficult to detect, except perhaps 
via its lensing of light rays that happen to pass by it. But if a black 
hole occurs in a binary star system, it is possible for mass to be 
transferred onto the black hole from its companion, if the compan- 
ion’s evolution causes it to expand into a giant and intrude upon 
the black hole’s gravity well. The infalling gas would then get hot 
and emit radiation before disappearing behind the event horizon. 
The object known as Cygnus X-1 is the best-studied example. This 
X-ray-emitting object was discovered by a rocket-based experiment 
in 1964. It is part of a double-star system, the other member being 
a blue supergiant. They orbit their common center of mass with 
a period of 5.6 days. The orbit is nearly circular, and has a semi- 
major axis of about 0.2 times the distance from the earth to the sun. 
Applying Kepler’s law of periods to these data constrains the sum 
of the masses, and knowledge of stellar structure fixes the mass of 
the supergiant. The result is that the mass of Cygnus X-1 is greater 
than about 10 solar masses, and this is confirmed by multiple meth- 
ods. Since this is far above the Tolman-Oppenheimer-Volkoff limit, 
Cygnus X-1 is believed to be a black hole, and its X-ray emissions 
are interpreted as the radiation from the disk of superheated ma- 
terial accreting onto it from its companion. It is believed to have 
more than 90% of the maximum possible spin for a black hole of its 


mass. = 


Around the turn of the 21st century, new evidence was found 
for the prevalence of supermassive black holes near the centers of 
nearly all galaxies, including our own. Near our galaxy’s center is 
an object called Sagittarius A*, detected because nearby stars orbit 
around it. The orbital data show that Sagittarius A* has a mass 


Gou et al., “The Extreme Spin of the Black Hole in Cygnus X-1,” http: 
//arxiv.org/abs/1106.3690 


a/A black hole accretes matter 
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of about four million solar masses, confined within a sphere with 
a radius less than 2.2 x 10’ km. There is no known astrophysical 
model that could prevent the collapse of such a compact object into 
a black hole, nor is there any plausible model that would allow this 
much mass to exist in equilibrium in such a small space, without 
emitting enough light to be observable. 


The existence of supermassive black holes is surprising. Gas 
clouds with masses greater than about 100 solar masses cannot nor- 
mally form stable stars, so supermassive black holes cannot be the 
end-point of the evolution of heavy stars. Mergers of multiple stars 
to form more massive objects are generally statistically unlikely, 
since a star is such a small target in relation to the distance be- 
tween the stars. Once astronomers were confronted with the empir- 
ical fact of their existence, a variety of mechanisms was proposed for 
their formation. Little is known about which of these mechanisms 
is correct, although the existence of quasars in the early universe is 
interpreted as evidence that mass accreted rapidly onto supermas- 
sive black holes in the early stages of the evolution of the galaxies. 
As of 2016, an explanation getting a lot of attention is that in the 
early universe, there was a brief period in which the ambient con- 
ditions allowed the creation of supermassive black holes by direct 
collapse.!? 


A skeptic could object that although Cygnus X-1 and Sagittar- 
ius A* are more compact than is believed possible for a neutron 
star, this does not necessarily prove that they are black holes. In- 
deed, speculative theories have been proposed in which exotic ob- 
jects could exist that are intermediate in compactness between black 
holes and neutron stars. These hypothetical creatures have names 
like black stars, gravastars, quark stars, boson stars, Q-balls, and 
electroweak stars. Although there is no evidence that these theories 
are right or that these objects exist, we are faced with the question 
of how to determine whether a given object is really a black hole or 
one of these other species. The defining characteristic of a black hole 
is that it has an event horizon rather than a physical surface. We 
currently have two ways of probing the structure of these stars at 
the radii where general relativity predicts the existence of an event 
horizon. 


If an object is not a black hole, then by conservation of en- 
ergy any matter that falls onto it must release its gravitational po- 
tential energy when it hits that surface. Cygnus X-1 has a copi- 
ous supply of matter falling onto it from its supergiant compan- 
ion, and Sagittarius A* likewise accretes a huge amount of gas 
from the stellar wind of nearby stars. By analyzing millimeter and 
infrared very-long-baseline-interferometry observations, Broderick, 


See, e.g., http: //arxiv.org/abs/1402.5675 
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Loeb, and Narayan! have shown that if Sagittarius A* had a sur- 
face, then the luminosity of this surface must be less than 0.3% of 
the luminosity of the accretion disk. But this is not physically pos- 
sible, because there are fundamental limits on the efficiency with 
which the gas can radiate away its energy before hitting the sur- 
face. We can therefore conclude that Sagittarius A* must have an 
event horizon. Its event horizon may be imaged directly in the near 
future.!° 


A second approach is through the observation of gravitational 
waves. As discussed in more detail in ch. 9, 2016 saw the first direct 
observation of gravitational waves. The waveform that was detected 
(figure b, p. 374) fits very well with the predictions of general rela- 
tivity for the merger of two black holes. It seems very unlikely that 
a waveform with this time-scale and characteristic shape could have 
been produced unless general relativity’s description of black holes 
is correct in detail. 


6.3.6 Singularities and cosmic censorship 
Informal ideas 


Since we observe that black holes really do exist, maybe we 
should take the singularity at r = O seriously. Physically, it says 
that the mass density and tidal forces blow up to infinity there. 


Generally when a physical theory says that observable quantities 
blow up to infinity at a particular point, it means that the theory has 
reached the point at which it can no longer make physical predic- 
tions. For instance, Maxwell’s theory of electromagnetism predicts 
that the electric field blows up like r~? near a point charge, and 
this implies that infinite energy is stored in the field within a finite 
radius around the charge. Physically, this can’t be right, because 
we know it only takes 511 keV of energy to create an electron out 
of nothing, e.g., in nuclear beta decay. The paradox is resolved by 
quantum electrodynamics, which modifies the description of the vac- 
uum around the electron to include a sea of virtual particles popping 
into and out of existence. 


In the case of a black hole singularity, it is possible that quantum 
mechanical effects at the Planck scale prevent the formation of a 
singularity. Unfortunately, we are unlikely to find any empirical 
evidence about this, since black holes always seem to come clothed 
in event horizons, so we outside observers cannot extract any data 
about the singularity inside. Even if we take a suicidal trip into 
a black hole, we get no data about the singularity, because the 
singularity in the Schwarzschild metric is spacelike, not timelike, 
and therefore it always lies in our future light cone, never in our 
past. 


Marxiv.org/abs/0903.1105 
Marxiv.org/abs/0906 .4040 
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In a way, the inaccessibility of singularities is a good thing. If a 
singularity exists, it is a point at which all the known laws of physics 
break down, and physicists therefore have no way of predicting any- 
thing about its behavior. There is likewise no great crisis for physics 
due to the Big Bang singularity or the Big Crunch singularity that 
occurs in some cosmologies in which the universe recollapses; we 
have no reasonable expectation of being able to make and test pre- 
dictions or retrodictions that extend beyond the beginning or end 
of the universe. 


What would be a crushing blow to the enterprise of physics would 
be a singularity that could sit on someone’s desk. As John Earman 
of the University of Pittsburgh puts it, anything could pop out of 
such a “naked” singularity (defined formally on p. 247), including 
green slime or your lost socks. 


Penrose’s cosmic censorship conjecture states that the laws of 
physics prevent the formation of naked singularities from nonsingu- 
lar and generic initial conditions. “Generic” is a necessary addition 
to Penrose’s original 1969 formulation, since Choptuik showed in 
1993 that certain perfectly fine-tuned initial conditions allowed col- 
lapse to a naked singularity.'® As of 2017, evidence is accumulating 
that cosmic censorship is false. This is discussed at greater length 
in section 6.3.6, p. 248. 


Formal definitions 


The remainder of this subsection provides a more formal expo- 
sition of the definitions relating to singularities. It can be skipped 
without loss of continuity. 


The reason we care about singularities is that they indicate an 
incompleteness of the theory, and the theory’s inability to make 
predictions. One of the simplest things we could ask any theory 
to do would be to predict the trajectories of test particles. For 
example, Maxwell’s equations correctly predict the motion of an 
electron in a uniform magnetic field, but they fail to predict the 
motion of an electron that collides head-on with a positron. It might 
have been natural for someone in Maxwell’s era (assuming they were 
informed about the existence of positrons and told to assume that 
both particles were pointlike) to guess that the two particles would 
scatter through one another at 9 = 0, their velocities momentarily 
becoming infinite. But it would have been equally natural for this 
person to refuse to make a prediction. 


Similarly, if a particle hits a black hole singularity, we should not 
expect general relativity to make a definite prediction. It doesn’t, 
because the geodesic equation breaks down. 


We would therefore like to define a singularity as a situation in 


'6Phys. Rev. Lett. 70, p. 9 


Chapter6 Vacuum Solutions 


which the geodesics of test particles can’t be extended indefinitely. 
But what does “indefinitely” mean? If the test particle is a photon, 
then the metric length of its world-line is zero. We get around this 
by defining length in terms of an affine parameter. 


Definition: A spacetime is said to be geodesically incomplete if 
there exist timelike or lightlike geodesics that cannot be extended 
beyond some finite affine parameter into the past or future. 


This is also a pretty good working definition of what we mean when 
we say that a spacetime contains a singularity, although it may not 
be optimal for all purposes.!’ The Schwarzschild spacetime has a 
singularity at r = 0, but not at the event horizon, since geodesics 
continue smoothly past the event horizon. Cosmological spacetimes 
contain a Big Bang singularity which prevents geodesics from being 
extended beyond a certain point in the past. 


Actual singularities involving geodesic incompleteness are to be 
distinguished from coordinate singularities, which are not really sin- 
gularities at all. In the Schwarzschild spacetime, as described in 
Schwarzschild’s original coordinates, some components of the met- 
ric blow up at the event horizon, but this is not an actual singularity. 
This coordinate system can be replaced with a different one in which 
the metric is well behaved. 


A harmless blow-up Example: 1 
Let’s define coordinates (ft, y) in the region of spacetime where 
youre sitting and reading this book. Let (0,0) be your current 
time and position, and for convenience let this be an inertial frame 
(so that your motion is not geodesic). The Riemann tensor, ex- 
pressed in these coordinates, has a component Riyyt = 2Gm/r°, 
where mand r are the mass and radius of the earth. This has the 
finite value of 1.5 x 10-6 s~?, which expresses the strength of a 
tidal effect near the earth’s surface. 


Now define a new coordinate u = y?. Applying the tensor trans- 
formation law, we have 


O 2 
Rtuut = Rryyt (%) ’ 


which is infinite at y = 0. This example demonstrates that we 
cannot test for a singularity by looking for a blow-up of the com- 
ponents of a curvature tensor at certain coordinates or as the 
coordinates approach some limit. 


There are two types of singularities: curvature singularities and 
non-curvature singularities. 


The big bang and black hole singularities are examples of cur- 
vature singularities, which can often be recognized because there 


'Geroch, ” What is a singularity in general relativity?,” Ann Phys 48 (1968) 
526. 
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are scalar measures of curvature such as RR ».q, known as the 
Kretschmann invariant, that blow up. These indicate that tidal 
forces blow up to infinity, and would destroy any observer. 


The reason curvature scalars are useful as tests for a curvature 
singularity is that since they’re scalars, they can’t diverge in one 
coordinate system but stay finite in another (cf. example 1). A 
sufficient condition for a singularity to be a curvature singularity is 
if timelike or lightlike geodesics can only be extended to some finite 
affine parameter, and some curvature scalar (not necessarily every 
such scalar) approaches infinity as we approach this value of the 
affine parameter. 


But we should not expect this to be a necessary condition for a 
curvature singularity. Example 2 below shows that the most com- 
monly occurring curvature scalars may not be enough to catch the 
presence of a singularity. This is not too surprising, since curvature 
scalars do not suffice to tell us everything there is to know about 
the curvature of a spacetime (example 3). 


Incompleteness with finite curvature scalars Example: 2 
Consider the 1+1-dimensional spacetime described by the metric 


ds* = A(dt® — dx?) 
A=1/(1+e'), 


with —co < xX < co and —co < t < oo. For large negative t it 
is indistinguishable from Minkowski space. The following Maxima 
code computes its Riemann tensor and the scalar curvature R 
and the Kretchmann invariant Kk. 


load(ctensor) ; 
dim:2; 
ct_coords: [t,x]; 
u:1/(1+texp(t)); 
lg:matrix([u,0], 

[0,-u]); 
cmetric(); 
ricci(true) ; 
lriemann (true) ; 
uriemann (true) ; 
scurvature();/* scalar curvature */ 
rinvariant(); /* Kretchmann */ 


The results for the two curvature scalars are 
R=(1+e-')-' — and 
Kea(ite7) 


both of which are finite everywhere; they go from 0 at large neg- 
ative times to 1 at large positive times. 
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From these results we would not imagine that there was any sin- 
gularity present, but blow-ups of curvature scalars are only a suf- 
ficient condition for geodesic incompleteness, not a necessary 
one. Consider the timelike curve x = 0, which by symmetry is a 
geodesic. If we integrate the proper time along this geodesic, we 
get a finite limit as f > oo. Since proper time qualifies as an affine 
parameter, this geodesic is incomplete. 


But it is not so obvious that this spacetime is “really” singular. It 
is possible that we could smoothly extend it beyond t = +00. If so, 
then the singularity at t = +oo would be a kind of fake singularity, 
of the type that we could obtain simply by chopping off the part of 
Minkowski space with f > 0. 


Vanishing curvature scalars Example: 3 

We remarked above that curvature scalars do not in general suf- 
fice to tell us everything about the curvature of a spacetime. In 
fact, there is an entire class of curved spacetimes such that ev- 
ery curvature invariant vanishes everywhere. Schmidt!® gives the 
example 


ds* = dudv — a*(u) dw”, 


where a is an arbitrary nonlinear function. The eigenvalues of 
this metric are 1, —1, and —a?, so its signature is + — —, i.e., this 
is general relativity in 2 +1 dimensions. A computation shows 
that the space is not flat, since, e.g., Ruy = —a’/a. The u and v 
directions are lightlike, so this metric represents a wavelike dis- 
turbance traveling at the speed of light. (Since the Ricci tensor 
doesn’t vanish, this isn’t a vacuum solution, and we don’t have a 
gravitational wave in vacuum. Such waves, as described in ch. 9, 
are transverse and can only exist in 3 + 1 or more dimensions.) 


The lightlike character of u and v motivates us to consider coor- 
dinate transformations of the form (u,v) — (uD,v/D), because 
in the case a = 0, which is flat, this would be a Lorentz boost with 
a Doppler shift factor D. In the case where D approaches zero, 
we are chasing the wave at a velocity approaching c, so the wave 
Doppler-shifts to undetectability. All components of the Riemann 
tensor, as well as their derivatives, approach zero. 


Now consider any curvature scalar / that is expressible as a con- 
tinuous function of the Riemann tensor and its derivatives. By 
continuity, / approaches zero as D — O. But curvature scalars 
are scalars, so they are invariant under coordinate transforma- 
tions. It therefore follows that / = O identically, regardless of the 
value of D. Thus we have a spacetime that, although curved, has 
no nonvanishing curvature scalars anywhere. 


Singularities can also occur without any blow-up in the curva- 


'8«Why do all the curvature invariants of a gravitational wave vanish?,” 
arxiv.org/abs/gr-qc/9404037 


Section 6.3 Black holes 


245 


b/A conical singularity. The 
cone has zero intrinsic curvature 
everywhere except at its tip. 
Geodesic 1 can be extended in- 
finitely far, but geodesic 2 cannot; 
since the metric is undefined at 
the tip, there is no sensible way 
to define how geodesic 2 should 
be extended. 


ture. An example of this is a conical singularity, figure b. (Cf. figure 
b, 193.) In 2+1-dimensional relativity, curvature vanishes identi- 
cally in the case of a vacuum, and the only kind of curvature singu- 
larity we can have is a non-curvature singularity. Another example 
of a non-curvature singularity is provided by the Taub-NUT family 
of spacetimes (Hawking and Ellis, sections 5.8 and 8.5), in which 
some lightlike geodesics spiral in toward a horizon, but tidal forces 
do not blow up at the horizon. There is no clear reason to expect 
that non-curvature singularities could actually exist in our universe, 
but neither is there any proof that they cannot be formed by natural 
processes. 


A non-curvature singularity Example: 4 
Consider the metric 


ds* = vat? — tde? 


in 1+1 dimensions, where 0 is an angle running around the circle. 
This is a simplified version of a Taub-NUT spacetime. Lightlike 
geodesics have ds = 0, so dt/t = +d0, and 0 = (const) + In(+?), 
where the two signs can be chosen independently. Single out the 
geodesic 0 = Int, which is defined only for t > 0. It wraps around 
the circle infinitely many times as t goes to zero, and we would 
like to know whether it is incomplete there. If the affine parameter 
goes to infinity as t approaches zero, then the geodesic is not 
incomplete. 


The nonvanishing Christoffel symbols are [, = —1/2t, T%) = 
rm, = 1/2t, andl4, = t/2 (problem 3, p. 209). The resulting 
geodesic equations are 

ne: 


z_ tp 
; at 2 


where dots represent differentiation with respect to the affine pa- 
rameter A. Implicit differentiation of the equation @ = Int gives 
6 = t/t, and plugging this in to the first geodesic equation re- 
sults in f = 0. We can therefore take t = A. (We could also 
take t = a\ + b, which would result in a different and equally valid 
affine parameter.) If A had gone to —co as t went to zero, then 
we would have demonstrated that the geodesic was complete. It 
approaches a finite limit instead, which suggests, but does not 
prove, that it is incomplete. 


The change of coordinates 8 — 0 — Int allows the counterclock- 
wise lightlike geodesics to be continued through t = 0. (Because 
this transformation is not a diffeomorphism, it is not just a renam- 
ing of points but an actual physical change in the structure of the 
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spacetime; it is equivalent to cutting apart the halves with t < 0 
and t > 0 and gluing them back together in a different way.) The 
corresponding geodesics in the clockwise direction, however, re- 
main incomplete. A different change of coordinates extends the 
clockwise but not the counterclockwise ones. In all cases there 
are incomplete geodesics, so it still appears that we do have a 
singularity. (For more discussion of this example, see Hawking 
and Ellis, sec. 5.8.) Since curvature singularities don’t exist in 
less than 3+1 dimensions, this is a non-curvature singularity. (A 
calculation also shows that this particular spacetime is flat.) 


A singularity is not considered to be a point or set of points in 
a spacetime; it’s more like a hole in the topology of the manifold. 
For example, the Big Bang didn’t occur at a point or set of points. 
A singularity represents a breakdown in the metric, and without 
a metric we may not even be able to tell the difference between 
one point and many. For more on these issues, see the discussions 
of boundary constructions on p. 275. There is a sense in which a 
black hole singularity is not a thing at all, and has no definable 
characteristics; see p. 283. 


One point, or many? Example: 5 
Suppose | have a two-dimensional space with coordinates (u, v), 
and | ask you whether S = {(u,v)|v = 0} is a point or a curve, 
while refusing to divulge what metric | have in mind. You’d proba- 
bly say S was a curve, and if the metric was ds* = du*+dv*, you'd 
be right. On the other hand, if the metric was ds* = v? du? + dv?, 
S would be a point. 


This was an example where there were two possible metrics we 
could imagine. At a singularity, it’s even worse. There is no pos- 
sible metric that we can extend to the singularity. 


Because a singularity isn’t a point or a point-set, we can’t define 
its timelike or spacelike character in quite the way we would with, 
say, a curve. A timelike singularity, also referred to as a locally 
naked singularity, is one such that an observer with a timelike world- 
line can have the singularity sometimes in his future light-cone and 
sometimes in his past light-cone.!9 


Schwarzschild and Big Bang singularities are spacelike. (Note 
that in the Schwarzschild metric, the Schwarzschild r and t coordi- 
nates swap their timelike and spacelike characters inside the event 
horizon.) 


The definition of a timelike singularity is local. A timelike singu- 
larity would be one that you could have sitting on your desk, where 
you could look at it and poke it with a stick. 


Penrose, Gravitational radiation and gravitational collapse; Proceedings of 
the Symposium, Warsaw, 1973. Dordrecht, D. Reidel Publishing Co. pp. 82-91, 
free online at adsabs .harvard.edu/full/1974IAUS...64...82P 
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A naked singularity is one from which timelike or lightlike world- 
lines can originate and then escape to infinity. The Schwarzschild 
metric’s singularity is not naked. This notion is global. 


Evidence accumulating against cosmic censorship 


As of 2017, evidence is accumulating that cosmic censorship is 
false. Back in 1969 when Roger Penrose first formulated the hy- 
pothesis, relativists had been strongly influenced by a 1939 calcula- 
tion by Oppenheimer and Snyder for the gravitational collapse of a 
uniform, spherical cloud of “dust,” meaning material particles that 
act like a pressureless ideal fluid (see p. 295). (Cf. p. 146 on the 
Tolman-Oppenheimer-Volkoff limit, derived earlier the same year.) 
Even though Oppenheimer and Snyder were too timid to continue 
the calculation past the formation of an event horizon, their result 
was not taken seriously for years, the notion of a runaway gravita- 
tional collapse being too distant from the state of the art in terms of 
observation. But later workers did complete the calculation. They 
found that a singularity developed, but that the horizon formed 
early enough to cloak it, so that no timelike or lightlike geodesic 
from the singularity could escape to a distant observer. This was 
consistent with a weak version of the cosmic censorship hypothesis, 
that a (globally) naked singularity cannot form from gravitational 
collapse. 


But to interpret the result as evidence for cosmic censorship was 
misleading. With hindsight, there are clear Newtonian reasons to 
suspect that a perfectly homogeneous cloud has properties that are a 
little too special. In the Newtonian version the internal gravitational 
field is proportional to r. Starting from rest at r, a particle has to 
travel a distance r to reach the center, but since the acceleration is 
proportional to r, the time needed to reach the center is the same 
for all particles. There is a Newtonian singularity of infinite density, 
and this occurs at the same time for all particles, which is after the 
formation of a surface from which the escape velocity has any fixed 
value, such as c. Therefore in Newtonian terms, cosmic censorship 
holds, but it holds only because of the perfect homogeneity of the 
cloud. 


In fact, the general-relativistic version of inhomogeneous grav- 
itational collapse had already been worked out around 1933 by 
Lemaitre, Tolman, and Bondi, again for the case of a spherical cloud, 
but now with a density profile p(r). This family of metrics, called 
the Lemaitre-Tolman-Bondi metrics, is general enough to include 
models of cosmological expansion as well as models of local gravita- 
tional collapse. Tolman applied the collapse model to the formation 
of “nebulae,” i.e., galaxies, in the early universe, but did not fol- 
low the evolution of the collapse to its ultrarelativistic dénouement, 
as Oppenheimer and Snyder had. When one does so, dealing with 
some technical obstacles and imposing some constraints for physi- 
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cal reasonableness, it turns out that in most cases, the result is a 
locally naked singularity.2° That is, fine-tuning is required in order 
to produce something more like a standard black hole. It remains 
to be seen whether this holds true when the constraint of perfect 
spherical symmetry is relaxed. 


This does not necessarily mean on the face of it that cosmic 
censorship is dead, since spacetimes with spherical symmetry are 
themselves finely tuned in some sense, but it is rather a dramatic 
development, since people had imagined for 75 years, based on 
the Oppenheimer-Snyder calculations for homogeneous dust, that 
a black hole was the generic result of runaway gravitational col- 
lapse. Cosmic censorship is in a sense impossible to disprove, since 
part of the research program is to find the most appropriate defini- 
tion of the conjecture, but these results suggest that if it is to be 
true, then it has to be weakened so much as to be of little inter- 
est. In general, a meaningful definition of what it means to violate 
weak cosmic censorship should probably include something like the 
following ingredients. 


1. The initial conditions do not make available an infinite amount 
of energy within a finite region. 


2. The initial conditions do not contain singularities. 
3. Incomplete lightlike geodesics can arrive at a distant observer. 


4. Such a violation still occurs if we impose small perturbations 
on the initial data. 


5. The forms of matter are physically realistic. 


If we do not impose something like condition 1, then we can 
set up initial conditions that are of no interest because they are 
unrealistic. For this reason, one usually studies spacetimes that 
are asymptotically flat.24_ Condition 2 expresses the idea that any 
singularities that occur should be new ones formed by gravitational 
collapse. The censorship violation is expressed by condition 3. The 
notion of a distant observer can be further formalized by requiring 
that such a geodesic arrive at null infinity, .%*; see p. 272. If 4 
is omitted, then clear counterexamples to censorship are known. 
However, it is not known whether there is an appropriately rigorous 
way to define “small perturbations” here.?? Realistic matter fields, 


?0Joshi and Malafarina, arxiv.org/abs/1405.1146 

21 Asymptotic flatness was introduced informally on p. 149 and is defined in 
detail in section 7.4.2. It may also be necessary to impose a requirement that 
the matter fields fall off at some rate as we go to infinity. 

22Tm technical terms, we do not have any topology or measure defined on 
the set of all possible initial conditions. In actual work to date, people have 
selected some set of possible initial conditions, described by some small number of 
adjustable parameters, and have then tried to test condition 4 using a seemingly 
natural topology and measure defined on the space of those parameters. 
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5, are expected, for example, not to have negative mass.?? 


Because weak cosmic censorship seems to be violated if described 
by these five conditions, people have started looking for additional 
conditions that could salvage the conjecture. 


Wald?* suggests adding a sixth requirement. He proposes that 
the types of matter be further restricted to ones having the property 
that if the metric is fixed, rather than dynamical as in general rela- 
tivity, then no singularities occur. This seems to me to be much too 
strong a condition, and there are indications that it is not sufficient. 


Another proposal is along the following lines. When a naked 
singularity occurs, then we have a region of spacetime for which 
the singularity is inside the past lightcone. The lightlike surface 
constituting the boundary of this region is called a Cauchy horizon. 
An observer who passes beyond the Cauchy horizon can observe 
arbitrary information, i.e., phenomena not predicted by any laws of 
physics, and infinite fluxes of energy. Roger Penrose has, however, 
pointed out that in certain illustrative cases, there is a tendency 
for energy from the entire spacetime prior to the singularity to be 
focused onto the Cauchy horizon. The result could then be that such 
an observer is destroyed when passing through the Cauchy horizon. 
In other words, the Cauchy horizon actually turns into a singularity. 
Penrose’s mechanism appears to fail, however, for a spacetime with 
a positive cosmological constant, which is what we actually have in 
our universe. 


6.3.7 Hawking radiation 
Radiation from black holes 


Since event horizons are expected to emit blackbody radiation, 
a black hole should not be entirely black; it should radiate. This is 
called Hawking radiation. Suppose observer B just outside the event 
horizon blasts the engines of her rocket ship, producing enough ac- 
celeration to keep from being sucked in. By the equivalence princi- 
ple, what she observes cannot depend on whether the acceleration 
she experiences is actually due to a gravitational field. She there- 
fore detects radiation, which she interprets as coming from the event 
horizon below her. As she gets closer and closer to the horizon, the 
acceleration approaches infinity, so the intensity and frequency of 
the radiation grows without limit. 


A distant observer A, however, sees a different picture. Accord- 
ing to A, B’s time is extremely dilated. A sees B’s acceleration 
as being only ~ 1/m, where m is the mass of the black hole; A 
does not perceive this acceleration as blowing up to infinity as B 


23 More rigorously, we expect them to satisfy suitable energy conditions, section 
8.1.3, p. 307. 

24“Gravitational Collapse and Cosmic Censorship,” arxiv. org/abs/gr-qc/ 
9710068 
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approaches the horizon. When A detects the radiation, it is ex- 
tremely red-shifted, and it has the spectrum that one would expect 
for a horizon characterized by an acceleration a ~ 1/m. The result 
for a 10-solar-mass black hole is T ~ 107° K, which is so low that 
the black hole is actually absorbing more energy from the cosmic 
microwave background radiation than it emits. 


Direct observation of black-hole radiation is therefore probably 
only possible for black holes of very small masses. These may have 
been produced soon after the big bang, or it is conceivable that 
they could be created artificially, by advanced technology. If black- 
hole radiation does exist, it may help to resolve the information 
paradox, since it is possible that information that goes into a black 
hole is eventually released via subtle correlations in the black-body 
radiation it emits. 


Particle physics 


Hawking radiation has some intriguing properties from the point 
of view of particle physics. In a particle accelerator, the list of 
particles one can create in appreciable quantities is determined by 
coupling constants. In Hawking radiation, however, we expect to 
see a representative sampling of all types of particles, biased only 
by the fact that massless or low-mass particles are more likely to 
be produced than massive ones. For example, it has been specu- 
lated that some of the universe’s dark matter exists in the form of 
“sterile” particles that do not couple to any force except for gravity. 
Such particles would never be produced in particle accelerators, but 
would be seen in Hawking radiation. Based on present knowledge 
of particle physics, the main components of Hawking radiation, for 
all but the most microscopic black holes, are expected to be pho- 
tons and gravitons, which would compete on roughly equal terms, 
depending on the angular momentum of the black hole.?° 


Hawking radiation would violate many cherished conservation 
laws of particle physics. Let a hydrogen atom fall into a black hole. 
We’ve lost a lepton and a baryon, but if we want to preserve con- 
servation of lepton number and baryon number, we cover this up 
with a fig leaf by saying that the black hole has simply increased 
its lepton number and baryon number by +1 each. But eventually 
the black hole evaporates, and the evaporation is probably mostly 
into zero-mass particles such as photons. Once the hole has evapo- 
rated completely, our fig leaf has evaporated as well. There is now 
no physical object to which we can attribute the +1 units of lepton 
and baryon number. 


Black-hole complementarity 


A very difficult question about the relationship between quan- 
tum mechanics and general relativity occurs as follows. In our ex- 


?>Dong, arxiv.org/abs/1511.05642 
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ample above, observer A detects an extremely red-shifted spectrum 
of light from the black hole. A interprets this as evidence that 
the space near the event horizon is actually an intense maelstrom 
of radiation, with the temperature approaching infinity as one gets 
closer and closer to the horizon. If B returns from the region near 
the horizon, B will agree with this description. But suppose that 
observer C simply drops straight through the horizon. C does not 
feel any acceleration, so by the equivalence principle C does not 
detect any radiation at all. Passing down through the event hori- 
zon, C says, “A and B are liars! There’s no radiation at all.” A 
and B, however, C see as having entered a region of infinitely in- 
tense radiation. “Ah,” says A, “too bad. C should have turned 
back before it got too hot, just as I did.” This is an example of a 
principle we’ve encountered before, that when gravity and quantum 
mechanics are combined, different observers disagree on the number 
of quanta present in the vacuum. We are presented with a paradox, 
because A and B believe in an entirely different version of reality 
that C. A and B say C was fricasseed, but C knows that that didn’t 
happen. One suggestion is that this contradiction shows that the 
proper logic for describing quantum gravity is nonaristotelian, as 
described on page 67. This idea, suggested by Susskind et al., goes 
by the name of black-hole complementarity, by analogy with Niels 
Bohr’s philosophical description of wave-particle duality as being 
“complementary” rather than contradictory. In this interpretation, 
we have to accept the fact that C experiences a qualitatively differ- 
ent reality than A and B, and we comfort ourselves by recognizing 
that the contradiction can never become too acute, since C is lost 
behind the event horizon and can never send information back out. 


6.3.8 Black holes in d dimensions 


It has been proposed that our universe might actually have not 
d = 4 dimensions but some higher number, with the d — 4 “extra” 
ones being spacelike, and curled up on some small scale p so that 
we don’t see them in ordinary life. One candidate for such a scale 
p is the Planck length, and we then have to talk about theories of 
quantum gravity such as string theory. On the other hand, it could 
be the 1 TeV electroweak scale; the motivation for such an idea is 
that it would allow the unification of electroweak interactions with 
gravity. This idea goes by the name of “large extra dimensions” — 
“large” because p is bigger than the Planck length. In fact, in such 
theories the Planck length is the electroweak unification scale, and 
the number normally referred to as the Planck length is not really 
the Planck length.”° 


In d dimensions, there are d—1 spatial dimensions, and a surface 
of spherical symmetry has d— 2. In the Newtonian weak-field limit, 
the density of gravitational field lines falls off like m/r¢~? with dis- 


°6Kanti, arxiv.org/abs/hep-ph/0402168 
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tance from a source m, and we therefore find that Newton’s law of 
gravity has an exponent of —(d — 2). If d 4 3, we can integrate to 
find that the gravitational potential varies as ® ~ —mr~‘¢-3), Pass- 
ing back to the weak-field limit of general relativity, the equivalence 
principle dictates that the g term of the metric be approximately 
1+ 2, so we find that the metric has the form 


ds? = (1 — 2mr~(4-9)) de? — (...) dr? — r?. d6? — r? sin? 6 d¢?. 


This looks like the Schwarzschild form with no other change than 
a generalization of the exponent, and in fact Tangherlini showed 
in 1963 that for d > 4, one obtains the exact solution simply by 
applying the same change of exponent to g,, as well.?’ 


If large extra dimensions do exist, then this is the actual form 
of any black-hole spacetime for r < p, where the background curva- 
ture of the extra dimensions is negligible. Since the exponents are 
all changed, gravitational forces become stronger than otherwise ex- 
pected at small distances, and it becomes easier to make black holes. 
It has been proposed that if large extra dimensions exist, microscopic 
black holes would be observed at the Large Hadron Collider. They 
would immediately evaporate into Hawking radiation (p. 250), with 
an experimental signature of violating the standard conservation 
laws of particle physics. As of 2010, the empirical results seem to 
be negative.?® 


The reasoning given above fails in the case of d = 3, i.e., 2+1- 
dimensional spacetime, both because the integral of r~! is not r° 
and because the Tangherlini-Schwarzschild metric is not a vacuum 
solution. As shown in problem 12 on p. 259, there is no counter- 
part of the Schwarzschild metric in 2+1 dimensions. This is essen- 
tially because for d = 3 mass is unitless, so given a source having 
a certain mass, there is no way to set the distance scale at which 
Newtonian weak-field behavior gives way to the relativistic strong 
field. Whereas for d > 4, Newtonian gravity is the limiting case 
of relativity, for d = 3 they are unrelated theories. In fact, the 
relativistic theory of gravity for d = 3 is somewhat trivial. Space- 
time does not admit curvature in vacuum solutions,2” so that the 
only nontrivial way to make non-Minkowski 2+1-dimensional space- 
times is by gluing together Minkowski pieces in various topologies, 
like gluing pieces of paper to make things like cones and Mo6bius 
strips. 2+1-dimensional gravity has conical singularities, but not 
Schwarzschild-style ones that are surrounded by curved spacetime. 


If black-hole solutions exist in d dimensions, then one can extend 
such a solution to d+1 dimensions with cylindrical symmetry, form- 
ing a “black string.” The nonexistence of d = 3 black holes implies 


27Emparan and Reall, “Black Holes in Higher Dimensions,” relativity. 
livingreviews.org/Articles/lrr-2008-6/ 

*Snttp://arxiv.org/abs/1012.3375 

*%arxiv.org/abs/gr-qc/0503022v4 
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that black string solutions do not exist in our own d = 4 universe. 
However, different considerations arise in a universe with a negative 
cosmological constant (p. 318). There are then 2+1-dimensional so- 
lutions known as BTZ black holes.?° Since our own universe has a 
positive cosmological constant, not a negative one, we still find that 
black strings cannot exist. 


6.4 Degenerate solutions 
This section can be omitted on a first reading. 


At the event horizon of the Schwarzschild spacetime, the timelike 
and spacelike roles of the Schwarzschild r and ¢ coordinates get 
swapped around, so that the signs in the metric change from +——— 
to — + ——. In discussing cases like this, it becomes convenient to 
define a new usage of the term “signature,” as s = p—q, where p is 
the number of positive signs and g the number of negative ones. This 
can also be represented by the pair of numbers (p,q). The example 
of the Schwarzschild horizon is not too disturbing, both because 
the funny behavior arises at a singularity that can be removed by a 
change of coordinates and because the signature stays the same. An 
observer who free-falls through the horizon observes that the local 
properties of spacetime stay the same, with |s| = 2, as required by 
the equivalence principle. 


But this only makes us wonder whether there are other examples 
in which an observer would actually detect a change in the metric’s 
signature. We are encouraged to think of the signature as something 
empirically observable because, for example, it has been proposed 
that our universe may have previously unsuspected additional space- 
like dimensions, and these theories make testable predictions. Since 
we don’t notice the extra dimensions in ordinary life, they would 
have to be wrapped up into a cylindrical topology. Some such the- 
ories, like string theory, are attempts to create a theory of quantum 
gravity, so the cylindrical radius is assumed to be on the order of 
the Planck length, which corresponds quantum-mechanically to an 
energy scale that we will not be able to probe using any foresee- 
able technology. But it is also possible that the radius is large — 
a possibility that goes by the name of “large extra dimensions” — 
so that we could see an effect at the Large Hadron Collider. Noth- 
ing in the formulation of the Einstein field equations requires a 3+1 
(ie., (1,3)) signature, and they work equally well if the signature 
is instead 4+1, 5+1, .... Newton’s inverse-square law of gravity is 
described by general relativity as arising from the three-dimensional 
nature of space, so on small scales in a theory with n large extra di- 
mensions, the 1/r? behavior changes over to 1/r2*”, and it becomes 
possible that the LHC could produce microscopic black holes, which 
would immediately evaporate into Hawking radiation in a charac- 


30arxiv.org/abs/gr-qc/9506079v1 
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teristic way. 


So it appears that the signature of spacetime is something that 
is not knowable a priori, and must be determined by experiment. 
When a thing is supposed to be experimentally observable, general 
relativity tells us that it had better be coordinate-independent. Is 
this so? A proposition from linear algebra called Sylvester’s law of 
inertia encourages us to believe that it is. The theorem states that 
when a real matrix A is diagonalized by a real, nonsingular change 
of basis (a similarity transformation S~!AS), the number of posi- 
tive, negative, and zero diagonal elements is uniquely determined. 
Since a change of coordinates has the effect of applying a similar- 
ity transformation on the metric, it appears that the signature is 
coordinate-independent. 


This is not quite right, however, as shown by the following para- 
dox. The coordinate invariance of general relativity tells us that if 
all clocks, everywhere in the universe, were to slow down simulta- 
neously (with simultaneity defined in any way we like), there would 
be no observable consequences. This implies that the spacetime 
ds? = —tdt? — dé?, where dé? = da? + dy? + dz?, is empirically 
indistinguishable from a flat spacetime. Starting from t = —oo, 
the positive gj component of the metric shrinks uniformly, which 
should be harmless. We can indeed verify by direct evaluation of the 
Riemann tensor that this is a flat spacetime (problem 10, p. 259). 
But for t > 0 the signature of the metric switches from + — —— to 
——-—-, ie., from Lorentzian (|s| = 2) to Euclidean (|s| = 4). This 
is disquieting. For t < 0, the metric is a perfectly valid description 
of our own universe (which is approximately flat). Time passes, and 
there is no sign of any impending disaster. Then, suddenly, at some 
point in time, the entire structure of spacetime undergoes a horrible 
spasm. This is a paradox, because we could just as well have posed 
our initial conditions using some other coordinate system, in which 
the metric had the familiar form ds? = dt? — dé?. General relativity 
is supposed to be agnostic about coordinates, but a choice of coor- 
dinate leads to a differing prediction about the signature, which is 
a coordinate-independent quantity. 


We are led to the resolution of the paradox if we explicitly 
construct the coordinate transformation involved. In coordinates 
(t,2,y,z), we have ds? = —tdt? — dé”. We would like to find the 
relationship between ¢ and some other coordinate u such that we 
recover the familiar form ds? = du? — dé? for the metric. The tensor 
transformation law gives 
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a/ The change of coordinates is 
degenerate at t = 0. 


with solution 


2 43/2 


U=do , t<0. 


There is no solution for t > 0. 


If physicists living in this universe, at t < 0, for some reason 
choose ¢ as their time coordinate, there is in fact a way for them to 
tell that the cataclysmic event at t = 0 is not a reliable prediction. 
At t = 0, their metric’s time component vanishes, so its signature 
changes from + to 0 . At that moment, the machin- 
ery of the standard tensor formulation of general relativity breaks 
down. For example, one can no longer raise indices, because g@ is 
the matrix inverse of gap, but gay is not invertible. Since the field 
equations are ultimately expressed in terms of the metric using ma- 
chinery that includes raising and lowering of indices, there is no way 
to apply them at t = 0. They don’t make a false prediction of the 
end of the world; they fail to make any prediction at all. Physicists 
accustomed to working in terms of the ¢ coordinate can simply throw 
up their hands and say that they have no way to predict anything 
at t > 0. But they already know that their spacetime is one whose 
observables, such as curvature, are all constant with respect to time, 
so they should ask why this perfect symmetry is broken by singling 
out t = 0. There is physically nothing that should make one mo- 
ment in time different than any other, so choosing a particular time 
to call t = 0 should be interpreted merely as an arbitrary choice of 
the placement of the origin of the coordinate system. This suggests 
to the physicists that all of the problems they’ve been having are not 
problems with any physical meaning, but merely problems arising 
from a poor choice of coordinates. They carry out the calculation 
above, and discover the u time coordinate. Expressed in terms of u, 
the metric is well behaved, and the machinery of prediction never 
breaks down. 


The paradox posed earlier is resolved because Sylvester’s law 
of inertia only applies to a nonsingular transformation S. If S had 
been singular, then the S~! referred to in the theorem wouldn’t even 
have existed. But the transformation from u to t has 0t/Ou = 0 at 
u=t=0, so it is singular. This is all in keeping with the general 
philosophy of coordinate-invariance in relativity, which is that only 
smooth, one-to-one coordinate transformations are allowed. Some- 
one who has found a lucky coordinate like u, and who then con- 
templates transforming to t, should realize that it isn’t a good idea, 
because the transformation is not smooth and one-to-one. Someone 
who has started by working with an unlucky coordinate like ¢ finds 
that the machinery breaks down at t = 0, and concludes that it 
would be a good idea to search for a more useful set of coordinates. 
This situation can actually arise in practical calculations. 


What about our original question: could the signature of space- 
time actually change at some boundary? The answer is now clear. 


256 Chapter6 Vacuum Solutions 


Such a change of signature is something that could conceivably 
have intrinsic physical meaning, but if so, then the standard for- 
mulation of general relativity is not capable of making predictions 
about it. There are other formulations of general relativity, such as 
Ashtekar’s, that are ordinarily equivalent to Einstein’s, but that are 
capable of making predictions about changes of signature. However, 
there is more than one such formulation, and they do not agree on 
their predictions about signature changes. 
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Problems 


1 Show that in geometrized units, power is unitless. Find the 
equivalent in watts of a power that equals 1 in geometrized units. 


2 The metric of coordinates (0,¢) on the unit sphere is ds? = 
dé? + sin? 0d¢?. (a) Show that there is a singular point at which 
g” — oo. (b) Verify directly that the scalar curvature R = R¢ 
constructed from the trace of the Ricci tensor is never infinite. (c) 


Prove that the singularity is a coordinate singularity. 


3 (a) Space probes in our solar system often use a slingshot 
maneuver. In the simplest case, the probe is scattered gravitation- 
ally through an angle of 180 degrees by a planet. Show that in some 
other frame such as the rest frame of the sun, in which the planet 
has speed u toward the incoming probe, the maneuver adds 2u to 
the speed of the probe. (b) Suppose that we replace the planet with 
a black hole, and the space probe with a light ray. Why doesn’t this 
accelerate the ray to a speed greater than c? pb Solution, p. 394 


4 An observer outside a black hole’s event horizon can never 
observe a test particle falling past the event horizon and later hitting 
the singularity. We could therefore wonder whether general relativ- 
ity’s predictions about the interior of a black hole, and the singular- 
ity in particular, are even a testable scientific theory. However, the 
observer could herself fall into the black hole. The question is then 
whether she would reach the singularity within a finite proper time; 
if so, then it is observable to her. The purpose of this problem is to 
prove that this is so, using the techniques of section 6.2.6, p. 228. 
Suppose for simplicity that the observer starts at rest far away from 
the black hole, and falls directly inward toward it. (a) In the nota- 
tion of section 6.2.6, what are the values of EF and L in this case? 
(b) Find the function r(s), i.e., the observer’s Schwarzschild radial 
coordinate as a function of her proper time, and show that she does 
reach the singularity in finite proper time. > Solution, p. 395 


5 The curve given parametrically by (cos*t, sin? t) is called an 
astroid. The arc length along this curve is given by s = (3/2) sin? t, 
and its curvature by k = —(2/3)csc2t. By rotating this astroid 
about the x axis, we form a surface of revolution that can be de- 
scribed by coordinates (t,¢), where ¢ is the angle of rotation. (a) 
Find the metric on this surface. (b) Identify any singularities, and 
classify them as coordinate or intrinsic singularities. 
> Solution, p. 395 
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6 (a) Section 3.5.4 (p. 109) gave a flat-spacetime metric in 
rotating polar coordinates, 

ds? = (1 — w?r?) dt? — dr? — r? do”? — Quwr? do’ dt. 
Identify the two values of r at which singularities occur, and classify 
them as coordinate or non-coordinate singularities. 


(b) The corresponding spatial metric was found to be 


r2 


1 — wr? 

Identify the two values of r at which singularities occur, and classify 
them as coordinate or non-coordinate singularities. 

(c) Consider the following argument, which is intended to provide 
an answer to part b without any computation. In two dimensions, 
there is only one measure of curvature, which is equivalent (up to a 
constant of proportionality) to the Gaussian curvature. The Gaus- 
sian curvature is proportional to the angular deficit ¢€ of a triangle. 
Since the angular deficit of a triangle in a space with negative cur- 
vature satisfies the inequality —a < e€ < 0, we conclude that the 
Gaussian curvature can never be infinite. Since there is only one 
measure of curvature in a two-dimensional space, this means that 
there is no non-coordinate singularity. Is this argument correct, 
and is the claimed result consistent with your answers to part b? 

> Solution, p. 395 


ds? = — dr? dé”. 


7 The first experimental verification of gravitational redshifts 
was a measurement in 1925 by W.S. Adams of the spectrum of light 
emitted from the surface of the white dwarf star Sirius B. Sirius B 
has a mass of 0.98Mo and a radius of 5.9 x 10° m. Find the redshift. 


8 Show that, as claimed on page 237, applying the change of 
coordinates t/ = t—2m1n(r—2m) to the Schwarzschild metric results 
in a metric for which g,, and gj never blow up, but that gt does 
blow up. 


9 Use the geodesic equation to show that, in the case of a 
circular orbit in a Schwarzschild metric, & t/ ds? = 0. Explain why 
this makes sense. 


10 Verify by direct calculation, as asserted on p. 255, that the 


Riemann tensor vanishes for the metric ds? = —t dt? — dé”, where 
dé? = da? + dy? + dz?. > Solution, p. 396 
11 Suppose someone proposes that the vacuum field equation 


of general relativity isn’t Ray = 0 but rather Ray = k, where k is 
some constant that describes an innate tendency of spacetime to 
have tidal distortions. Explain why this is not a good proposal. 

> Solution, p. 396 


12 Prove, as claimed on p. 252, that in 2+1 dimensions, with a 
vanishing cosmological constant, there is no nontrivial Schwarzschild 
metric. > Solution, p. 396 
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13 On p. 223 I argued that there is no way to define a time- 
reversal operation in general relativity so that it applies to all space- 
times. Why can’t we define it by picking some arbitrary space- 
like surface that covers the whole universe, flipping the velocity of 
every particle on that surface, and evolving a new version of the 
spacetime backward and forward from that surface using the field 
equations? > Solution, p. 397 


14 In Newtonian gravity, a body in a hyperbolic orbit has 
a radius that decreases, reaches a minimum, and then goes back 
out to infinity. Show that if this is to happen in the Schwarzschild 
spacetime, for a particle with zero or nonzero mass, the distance of 
closest approach much be greater than or equal to the Schwarzschild 
radius 2m. (Note that it is possible to have trajectories that pass 
out through the horizon, although we don’t expect to observe such 
trajectories in the case of an astrophysical black hole.) 
> Solution, p. 397 


15 An astronaut in a spacesuit falls into a black hole. Estimate 
an order-of-magnitude inequality for the mass of the black hole if the 
astronaut is to survive past the event horizon without being killed 
by tidal forces. 
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Chapter 7 
Symmetries 


This chapter is not required in order to understand the later mate- 
rial. 


7.1 Killing vectors 


7.1.1 Killing vectors 


The Schwarzschild metric is an example of a highly symmetric 
spacetime. It has continuous symmetries in space (under rotation) 
and in time (under translation in time). In addition, it has discrete 
symmetries under spatial reflection and time reversal. In section 
6.2.6, we saw that the two continuous symmetries led to the exis- 
tence of conserved quantities for the trajectories of test particles, 
and that these could be interpreted as mass-energy and angular 
momentum. 


Generalizing, we want to consider the idea that a metric may 
be invariant when every point in spacetime is systematically shifted 
by some infinitesimal amount. For example, the Schwarzschild met- 
ric is invariant under t > t+ dt. In coordinates (x°, 2!,2?, x3) = 
(t,r,0,¢), we have a vector field (dt,0,0,0) that defines the time- 
translation symmetry, and it is conventional to split this into two 
factors, a finite vector field € and an infinitesimal scalar, so that the 


displacement vector is 
€ dt = (1,0, 0,0) dt. 


Such a field is called a Killing vector field, or simply a Killing vector, 
after Wilhelm Killing. When all the points in a space are displaced 
as specified by the Killing vector, they flow without expansion or 
compression. The path of a particular point, such as the dashed line 
in figure a, under this flow is called its orbit. Although the term 
“Killing vector” is singular, it refers to the entire field of vectors, 
each of which differs in general from the others. For example, the 
€ shown in figure a has a greater magnitude than a € near the neck 
of the surface. 


The infinitesimal notation is designed to describe a continuous 
symmetry, not a discrete one. For example, the Schwarzschild space- 
time also has a discrete time-reversal symmetry t > —t. This can’t 
be described by a Killing vector, because the displacement in time 
is not infinitesimal. 


a/The two-dimensional space 
has a symmetry which can be 
visualized by imagining it as a 
surface of revolution embedded 
in three-space. Without reference 
to any extrinsic features such 
as coordinates or embedding, 
an observer on this surface can 
detect the symmetry, because 
there exists a vector field &du 
such that translation by &du 
doesn’t change the distance 
between nearby points. 


b/ Wilhelm Killing (1847-1923). 
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c/Vectors at a point P on a 
sphere can be visualized as 
occupying a Euclidean plane that 
is particular to P. 


The Euclidean plane Example: 1 
The Euclidean plane has two Killing vectors corresponding to 
translation in two linearly independent directions, plus a third Killing 
vector for rotation about some arbitrarily chosen origin O. In Carte- 
sian coordinates, one way of writing a complete set of these is is 


&1 = (1,0) 
E2 = (0,1 
&3 =(-Yy,X) 


A theorem from classical geometry! states that any transforma- 
tion in the Euclidean plane that preserves distances and hana- 
edness can be expressed either as a translation or as a rota- 
tion about some point. The transformations that do not preserve 
handedness, such as reflections, are discrete, not continuous. 
This theorem tells us that there are no more Killing vectors to be 
found beyond these three, since any translation can be accom- 
plished using &; and &, while a rotation about a point P can be 
done by translating P to O, rotating, and then translating O back 
to P. 


In the example of the Schwarzschild spacetime, the components 
of the metric happened to be independent of t when expressed in 
our coordinates. This is a sufficient condition for the existence of a 
Killing vector, but not a necessary one. For example, it is possible 
to write the metric of the Euclidean plane in various forms such 
as ds? = da? + dy? and ds? = dr? + r?dd¢?. The first form is 
independent of x and y, which demonstrates that « > x + dx and 
y + y+ dy are Killing vectors, while the second form gives us 
o¢— o+d¢. Although we may be able to find a particular coordinate 
system in which the existence of a Killing vector is manifest, its 
existence is an intrinsic property that holds regardless of whether we 
even employ coordinates. In general, we define a Killing vector not in 
terms of a particular system of coordinates but in purely geometrical 
terms: a space has a Killing vector € if translation by an infinitesimal 
amount €du doesn’t change the distance between nearby points. 
Statements such as “the spacetime has a timelike Killing vector” are 
therefore intrinsic, since both the timelike property and the property 
of being a Killing vector are coordinate-independent. 


Killing vectors, like all vectors, have to live in some kind of 
vector space. On a manifold, this vector space is particular to a 
given point, figure c. A different vector space exists at every point, 
so that vectors at different points, occupying different spaces, can 
be compared only by parallel transport. Furthermore, we really 
have two such spaces at a given point, a space of contravariant 
vectors and a space of covariant ones. These are referred to as the 
tangent and cotangent spaces.The infinitesimal displacements we’ve 


‘Coxeter, Introduction to Geometry, ch. 3 
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been discussing belong to the contravariant (upper-index) space, but 
by lowering an index we can just as well discuss them as covariant 
vectors. The customary way of notating Killing vectors makes use of 
the fact, mentioned in passing on p. 202, that the partial derivative 
operators 0p, 0), 02,03 form the basis for a vector space. In this 
notation, the Killing vector of the Schwarzschild metric we’ve been 
discussing can be notated simply as 


E= O. 


The partial derivative notation, like the infinitesimal notation, 
implicitly refers to continuous symmetries rather than discrete ones. 
If a discrete symmetry carries a point P; to some distant point Pa, 
then P; and P2 have two different tangent planes, so there is not a 
uniquely defined notion of whether vectors €, and € at these two 
points are equal — or even approximately equal. There can there- 
fore be no well-defined way to construe a statement such as, “P; and 
Py, are separated by a displacement €.” In the case of a continuous 
symmetry, on the other hand, the two tangent planes come closer 
and closer to coinciding as the distance s between two points on an 
orbit approaches zero, and in this limit we recover an approximate 
notion of being able to compare vectors in the two tangent planes. 
They can be compared by parallel transport, and although parallel 
transport is path-dependent, the difference bewteen paths is pro- 
portional to the area they enclose, which varies as s?, and therefore 
becomes negligible in the limit s > 0. 


Self-check: Find another Killing vector of the Schwarzschild met- 
ric, and express it in the tangent-vector notation. 


It can be shown that an equivalent condition for a field to be a 
Killing vector is Vaé, + Vo§, = 0. This relation, called the Killing 
equation, is written without reference to any coordinate system, in 
keeping with the coordinate-independence of the notion. 


When a spacetime has more than one Killing vector, any lin- 
ear combination of them is also a Killing vector. This means that 
although the existence of certain types of Killing vectors may be 
intrinsic, the exact choice of those vectors is not. 


Euclidean translations Example: 2 
The Euclidean plane has two translational Killing vectors (1,0) 
and (0,1), i.e., Oy and dy. These same vectors could be ex- 


pressed as (1, 1) and (1, —1) in coordinate system that was rescaled 


and rotated by 45 degrees. 


A cylinder Example: 3 
The local properties of a cylinder, such as intrinsic flatness, are 
the same as the local properties of a Euclidean plane. Since the 
definition of a Killing vector is local and intrinsic, a cylinder has 
the same three Killing vectors as a plane, if we consider only a 
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d/Example © 3: A cylinder 
has three local symmetries, but 
only two that can be extended 
globally to make Killing vectors. 


patch on the cylinder that is small enough so that it doesn’t wrap 
all the way around. However, only two of these — the translations 
— can be extended to form a smooth vector field on the entire 
surface of the cylinder. These might be more naturally notated in 
(, Z) coordinates rather than (x, y), giving Oz and 0. 


A sphere Example: 4 
A sphere is like a plane or a cylinder in that it is a two-dimensional 
space in which no point has any properties that are intrinsically 
different than any other. We might expect, then, that it would 
have two Killing vectors. Actually it has three, €,, &y, and &,, cor- 
responding to infinitesimal rotations about the x, y, and z axes. 
To show that these are all independent Killing vectors, we need 
to demonstrate that we can't, for example, have & = c;&y + Coéz 
for some constants c; and Co. To see this, consider the actions of 
éy and €7 on the point P where the x axis intersects the sphere. 
(References to the axes and their intersection with the sphere are 
extrinsic, but this is only for convenience of description and vi- 
sualization.) Both €, and €, move P around a little, and these 
motions are in orthogonal directions, wherease &, leaves P fixed. 
This proves that we can't have €, = cyéy + C2éz. All three Killing 
vectors are linearly independent. 


This example shows that linear independence of Killing vectors 
can’t be visualized simply by thinking about the vectors in the 
tangent plane at one point. If that were the case, then we could 
have at most two linearly independent Killing vectors in this two- 
dimensional space. When we say “Killing vector” we're really re- 
ferring to the Killing vector field, which is defined everywhere on 
the space. 


Proving nonexistence of Killing vectors Example: 5 
> Find all Killing vectors of these two metrics: 


ds* = e-* dx? + e* dy” 
ds® = dx? + x? dy”. 


> Since both metrics are manifestly independent of y, it follows 
that oy is a Killing vector for both of them. Neither one has any 
other manifest symmetry, so we can reasonably conjecture that 
this is the only Killing vector either one of them has. However, one 
can have symmetries that are not manifest, so it is also possible 
that there are more. 


One way to attack this would be to use the Killing equation to find 
a system of differential equations, and then determine how many 
linearly independent solutions there were. 


But there is a simpler approach. The dependence of these met- 
rics on x suggests that the spaces may have intrinsic properties 
that depend on x; if so, then this demonstrates a lower symmetry 
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than that of the Euclidean plane, which has three Killing vectors. 
One intrinsic property we can check is the scalar curvature R. 
The following Maxima code calculates RF for the first metric. 


load(ctensor) ; 

dim:2; 

ct_coords: [x,y]; 
lg:matrix(Lexp(-x) ,0], [0,exp(x)]); 
cmetric(); 

R:scurvature(); /* scalar curvature */ 


The result is R = —e*, which demonstrates that points that differ 
in x have different intrinsic properties. Since the flow of a Killing 
field & can never connect points that have different properties, 
we conclude that €, = 0. If only €) can be nonzero, the Killing 
equation Vaép + Voéa = 0 simplifies to Vxéy = Vyéy = 0. These 
equations constrain both 0,éy and dyéy, which means that given 
a value of €, at some point in the plane, its value everywhere 
else is determined. Therefore the only possible Killing vectors 
are scalar multiples of the Killing vector already found. Since we 
don’t consider Killing vectors to be distinct unless they are linearly 
independent, the first metric only has one Killing vector. 


A similar calculation for the second metric shows that R = 0, and 
an explicit calculation of its Riemann tensor shows that in fact 
the space is flat. It is simply the Euclidean plane written in funny 
coordinates. This metric has the same three Killing vectors as the 
Euclidean plane. 


It would have been tempting to leap to the wrong conclusion about 
the second metric by the following reasoning. The signature of a 
metric is an intrinsic property. The metric has signature ++ every- 
where in the plane except on the y axis, where it has signature 
+0. This shows that the y axis has different intrinsic properties 
than the rest of the plane, and therefore the metric must have a 
lower symmetry than the Euclidean plane. It can have at most two 
Killing vectors, not three. This contradicts our earlier conclusion. 
The resolution of this paradox is that this metric has a removable 
degeneracy of the same type as the one described in section 6.4. 
As discussed in that section, the signature is invariant only under 
nonsingular transformations, but the transformation that converts 
these coordinates to Cartesian ones is singular. 


7.1.2 Inappropriate mixing of notational systems 
Confusingly, it is customary to express vectors and dual vectors 
by summing over basis vectors like this: 
Vv=v'O, 


= be 
w=w,de". 


Section 7.1 


Killing vectors 


265 


266 


This is an abuse of notation, driven by the desire to have up-down 
pairs of indices to sum according to the usual rules of the Einstein 
notation convention. But by that convention, a quantity like v or 
w with no indices is a scalar, and that’s not the case here. The 
products on the right are not tensor products, i.e., the indices aren’t 
being contracted. 


This muddle is the result of trying to make the Einstein notation 
do too many things at once and of trying to preserve a clumsy and 
outdated system of notation and terminology originated by Sylvester 
in 1853. In pure abstract index notation, there are not six flavors of 
objects as in the two equations above but only two: vectors like v® 
and dual vectors like wag. The Sylvester notation is the prevalent one 
among mathematicians today, because their predecessors committed 
themselves to it a century before the development of alternatives 
like abstract index notation and birdtracks. The Sylvester system is 
inconsistent with the way physicists today think of vectors and dual 
vectors as being defined by their transformation properties, because 
Sylvester considers v and w to be invariant. 


Mixing the two systems leads to the kinds of notational clashes 
described above. As a particularly absurd example, a physicist who 
is asked to suggest a notation for a vector will typically pick up a 
pen and write v’. We are then led to say that a vector is written in 
a concrete basis as a linear combination of dual vectors 0,! 


7.1.3 Conservation laws 


Whenever a spacetime has a Killing vector, geodesics have a 
constant value of v’&, where v? is the velocity four-vector. For ex- 
ample, the Schwarzschild metric has a Killing vector € = 0;, which, 
because of the notational clash described above, is an upper-index 
(contravariant) vector: € = 1. Test particles therefore have a con- 
served value of u;, interpreted as the mass-energy per unit mass. 
Since we normally work with upper-index versions of velocities, we 
can also express this by saying that € = (1—2m/r), and &v" is con- 
served, i.e., we have conservation of (1 — 2m/r) dt/dr. None of this 
depends on the choice of an affine parameter, so for a photon the 
conserved quantity would still exist, but would have to be expressed 
in terms of some other parameter. 


In addition, one can define a globally conserved quantity found 
by integrating the flux density P* = T& over the boundary of any 
compact orientable region.? In case of a flat spacetime, there are 
enough Killing vectors to give conservation of energy-momentum 


?Hawking and Ellis, The Large Scale Structure of Space-Time, p. 62, give 
a succinct treatment that describes the flux densities and proves that Gauss’s 
theorem, which ordinarily fails in curved spacetime for a non-scalar flux, holds in 
the case where the appropriate Killing vectors exist. For an explicit description 
of how one can integrate to find a scalar mass-energy, see Winitzki, Topics in 
General Relativity, section 3.1.5, available for free online. 
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and angular momentum. 


Energy-momentum in flat 1+1 spacetime Example: 6 
A flat 1+1-dimensional spacetime has Killing vectors 0, and O;. 
Corresponding to these are the conserved momentum and mass- 
energy, p and E. If we do a Lorentz boost, these two Killing vec- 
tors get mixed together by a linear transformation, corresponding 
to a transformation of p and E into a new frame. 


Gravitational Doppler shift for a spherical body Example: 7 
In section 6.2.7, p. 232, we used relatively simple mathematical 
techniques to show that the Doppler shift or gravitational time di- 
lation factor for the Schwarzschild spacetime is ,/1 — 1/r. (Here 
we choose units such that m = 1/2, so that the Schwarzschild 
radius is 1.) We now redo the analysis using fancier techniques 
that can be generalized to applications such as example 8 below. 
For convenience of notation, let A= 1 —1/r. 


Let a ray of light travel from an emitter with four-velocity u to an 
observer with four-velocity u’. Let the ray’s lightlike geodesic have 
tangent vectors v and v’ at emission and observation. We take 
the u vectors to be normalized. Normalization is impossible for 
the v vectors, but we assume that they are constructed using the 
same affine parameter, i.e., that v and v’ are the same under 
parallel transport, so that any normalization factor will be an over- 
all constant. The resulting Doppler shift is 


/ /\,/a 
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which is coordinate-independent and also independent of the choice 
of affine parameter for v. This relation is a purely kinematical fact, 
but a quick and dirty way to see that it must be true is that the 
energy-momentum vector of a light ray is proportional to its four- 
velocity, and therefore the numerator and denominator of this ex- 
pression each represent the respective observer’s measurement 
of the ray’s energy. 


Specializing now to the specific physical situation being analyzed, 
we know that the emitter and observer are both at rest relative to 
to the black hole, so that in Schwarzschild coordinates u and u’ 
have only t components. Because these vectors are normalized, 
and the metric has git = A, we have u! = A~'/? and ut = A’-1/2. 


The ray has a conserved energy Av’, so that Av’ = A’v’". There is 
also a nonzero v’, but we don’t need to calculate it for our present 
purposes. 


The Doppler shift comes out to be (A’/A)~'/2, which is consistent 
with the result found previously by more elementary methods. 


Doppler shifts across the horizon Example: 8 
As remarked in section 6.2.7, p. 232, we cannot extend the kind 
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of analysis in example 7 to the case where a ray crosses the hori- 
zon, because inside the horizon, there can be no observers or 
emitters that are at rest (i.e., having constant r). We can, how- 
ever, let an observer fall in through the horizon and continue ob- 
serving the light from the stars, all the way up until she hits the 
singularity. To make the results tractable and easy to interpret, let 
us have the observer infall from rest at infinity, and take the ray to 
be purely radial as well, i.e., the observer is looking at light from 
a star that has always been directly overhead during her free-fall. 


Using the same notation as in example 7, we find the following 
results in Schwarzschild coordinates k. The emitting star has 


u* = (1,0). 


The observer has conserved energy per unit mass Au’, which 
equals 1 because she was initially at rest at unfinity. At ooserva- 
tion of the ray, she will have u’' = A’~', and imposing normaliza- 


tion gives 

u* =(A’-',-V/1— A), 
with the minus sign because she is infalling. At emission, we fix 
the normalization of the ray’s velocity such that 


v“ =(1,-1). 
Applying conservation of energy and u@u, = 0, we find 
v'« = (A', -1), 


where again the minus sign is because the ray is infalling. Plug- 
ging in to the relation w’/w = u,v’2/upv® from example 7, we find 
w'/w = A-'(1 — V1 — A’), or, more transparently, 


wy 1 
i ea ee 


This result is graphed in figure e. It is well behaved at the horizon 
r= 1. For large r, we have w’/w = 1 — r—'/?, which is a kind 
of Newtonian approximation, since our observer has v = r—'/2, 
which is the result of Newtonian conservation of energy (1 /2)v? — 
m/r = 0, with m = 1/2 in this system of units. The shift is a red- 
shift, which is our semi-Newtonian expectation for large r but in 
fact holds for all r. 


The Doppler shift is always finite for r > 0, contrary to various 
claims that can be found in the popular literature to the effect that 
such an observer “freezes” at the horizon, and therefore sees the 
entire eventual history of the universe played out in the infalling 
visible light. This claim is presumably based on a belief that the 
vanishing of the Schwarzschild coordinate velocity has some in- 
trinsic physical meaning. Although it is possible to have such see- 
the-whole-future behavior in some spacetimes, it does not occur 
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in the Schwarzschild spacetime. In a spacetime where such an 
effect did occur, the incoming radiation would probably have infi- 
nite intensity, not only annihilating the observer but also possibly 
violating the approximation of a vacuum solution. 


A similar calculation is carried out in problem 7, p. 291, for the 
case where the observer is in a circular orbit. The result is that 
such an observer will always see both blueshifts and redshifts, 
depending on the direction from which the rays arrive. 


7.2 Spherical symmetry 


A little more work is required if we want to link the existence of 
Killing vectors to the existence of a specific symmetry such as spher- 
ical symmetry. When we talk about spherical symmetry in the con- 
text of Newtonian gravity or Maxwell’s equations, we may say, “The 
fields only depend on r,” implicitly assuming that there is an r coor- 
dinate that has a definite meaning for a given choice of origin. But 
coordinates in relativity are not guaranteed to have any particu- 
lar physical interpretation such as distance from a particular origin. 
The origin may not even exist as part of the spacetime, as in the 
Schwarzschild metric, which has a singularity at the center. Another 
possibility is that the origin may not be unique, as on a Euclidean 
two-sphere like the earth’s surface, where a circle centered on the 
north pole is also a circle centered on the south pole; this can also 
occur in certain cosmological spacetimes that describe a universe 
that wraps around on itself spatially. 


We therefore define spherical symmetry as follows. A spacetime 
S is spherically symmetric if we can write it as a union S = Us;,.; of 
nonintersecting subsets s,;, where each s has the structure of a two- 
sphere, and the real numbers r and t have no preassigned physical 
interpretation, but s,% is required to vary smoothly as a function 
of them. By “has the structure of a two-sphere,” we mean that no 
intrinsic measurement on s will produce any result different from the 
result we would have obtained on some two-sphere. A two-sphere 
has only two intrinsic properties: (1) it is spacelike, i.e., locally its 
geometry is approximately that of the Euclidean plane; (2) it has 
a constant positive curvature. If we like, we can require that the 
parameter r be the corresponding radius of curvature, in which case 
t is some timelike coordinate. 


To link this definition to Killing vectors, we note that condition 
2 is equivalent to the following alternative condition: (2’) The set 
s should have three Killing vectors (which by condition 1 are both 
spacelike), and it should be possible to choose these Killing vectors 
such that algebraically they act the same as the ones constructed 
explicitly in example 4 on p. 264. As an example of such an algebraic 
property, figure a shows that rotations are noncommutative. 
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A cylinder is not a sphere Example: 9 
> Show that a cylinder does not have the structure of a two- 
sphere. 


> The cylinder passes condition 1. It fails condition 2 because its 
Gaussian curvature is zero. Alternatively, it fails condition 2’ be- 
cause it has only two independent Killing vectors (example 3). 


A plane is not a sphere Example: 10 
> Show that the Euclidean plane does not have the structure of a 
two-sphere. 


> Condition 2 is violated because the Gaussian curvature is zero. 
Or if we wish, the plane violates 2’ because 0, and 0, commute, 
but none of the Killing vectors of a 2-sphere commute. 


7.3. Penrose diagrams and causality 


We can’t directly visualize a four-dimensional manifold. When a 
spacetime has a symmetry, however, we may be able to visual- 
ize the relevant properties the whole thing by considering a lower- 
dimensional part of it. By analogy, if we wanted to visualize the 
structure of the earth’s interior, we might draw a diagram showing 
a two-dimensional section through its center. In fact, we could get 
rid of two dimensions and simply draw a diagram of a single radial 
line running from the earth’s core to its surface; each point on this 
line would then represent a sphere. If we do this in general relativity, 
for a spacetime that is spherically symmetric, then we can reduce 
the four-dimensional to a two-dimensional one, with each point rep- 
resenting a two-sphere. By applying some further tricks, we will see 
that we can end up with a very convenient and useful visualization 
called a Penrose diagram, also known as a Penrose-Carter diagram 
or causal diagram. 


7.3.1 Flat spacetime 


As a warmup, figure a shows a Penrose diagram for flat (Minkowski) 


spacetime. The diagram looks 1 + 1-dimensional, but the conven- 
tion is that spherical symmetry is assumed, so two more dimensions 
are hidden, and we’re really portraying 3+ 1 dimensions. A typical 
point on the interior of the diamond region represents a 2-sphere. 
On this type of diagram, light cones look just like they would on a 
normal spacetime diagram of Minkowski space, but distance scales 
are highly distorted. The diamond represents the entire spacetime, 
with the distortion fitting this entire infinite region into that finite 
area on the page. Despite the distortion, the diagram shows lightlike 
surfaces as 45-degree diagonals. Spacelike and timelike geodesics, 
however, are distorted, as shown by the curves in the diagram. 


The distortion becomes greater as we move away from the center 
of the diagram, and becomes infinite near the edges. Because of 
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a/ Penrose 
spacetime. 


diagram for flat 
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b/Given a_ finite region of 
spacetime S, we can find a 
point like P that is spacelike with 
respect to the whole region, and 
a point like Q that is timelike with 
respect to the whole region. 


singularity + 


c/ Penrose diagram for 
Schwarzschild spacetime: a 
black hole that didn’t form by 
gravitational collapse. 
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this infinite distortion, the points i~ and i* actually represent 3- 
spheres. All timelike curves start at i~ and end at it, which are 
idealized points at infinity, like the vanishing points in perspective 
drawings. We can think of 7+ as the “Elephants’ graveyard,” where 
massive particles go when they die. Similarly, lightlike curves end 
on + (which includes its mirror image on the left), referred to as 
null infinity. The point at 2° is an infinitely distant endpoint for 
spacelike curves. Because of the spherical symmetry, the left and 
right halves of the diagram are redundant. 


It is possible to make up explicit formulae that translate back 
and forth between Minkowski coordinates and points on the dia- 
mond, but in general this is not necessary. In fact, the utility of 
the diagrams is that they let us think about causal relationships in 
coordinate-independent ways. A light cone on the diagram looks 
exactly like a normal light cone. 


Since this particular spacetime is homogeneous, it makes no dif- 
ference what spatial location on the diagram we pick as our axis 
of symmetry. For example, we could arbitrarily pick the left-hand 
corner, the central timelike geodesic (drawn straight) or one of the 
other timelike geodesics (represented as if it were curved). 


It may seem awkward or inconsistent that on the diagram, .4* 
and .“~ are shown as lines (representing 3-dimensional things), 
while i°, i+, and i~ are points (representing 2-spheres). Figure 
b shows why this actually makes sense. We can find points that 
are spacelike or lightlike in relation to an entire region, but it is not 
possible to find a point that is lightlike in relation to every point. 
This argument can be made more rigorous using Liouville’s theorem 
from complex analysis. 


7.3.2. Schwarzschild spacetime 


Figure c is a Penrose diagram for the Schwarzschild spacetime, 
i.e., a spacetime that looks like Minkowski space, except that it has 
one eternal black hole in it. This is a black hole that did not form 
by gravitational collapse. This spacetime isn’t homogeneous; it has 
a specific location that is its center of spherical symmetry, and this 
is the vertical line on the left marked r = 0. The triangle is the 
spacetime inside the event horizon; we could have copied it across 
the r = 0 line if we had so desired, but the copies would have been 
redundant. 


The Penrose diagram makes it easy to reason about causal re- 
lationships. For example, we can see that if a particle reaches a 
point inside the event horizon, its entire causal future lies inside 
the horizon, and all of its possible future world-lines intersect the 
singularity. The horizon is a lightlike surface, which makes sense, 
because it’s defined as the boundary of the set of points from which 


3See also p. 283. 
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a light ray could reach .4?. 
7.3.3 Astrophysical black hole 


Figure d is a Penrose diagram for a black hole that has formed 
by gravitational collapse. Using this type of diagram, we can suc- 
cinctly address one of the most vexing FAQs about black holes. 
(Cf. section 6.3.3, p. 237, where we took a more cumbersome ap- 
proach without Penrose diagrams.) If a distant observer watches the 
collapsing cloud of matter from which the black hole forms, her op- 
tical observations will show that the light from the matter becomes 
more and more gravitationally redshifted, and if she wishes, she can 
interpret this as an example of gravitational time dilation. As she 
waits longer and longer, the light signals from the infalling matter 
take longer and longer to arrive. The redshift approaches infinity 
as the matter approaches the horizon, so the light waves ultimately 
become too low in energy to be detectable by any given instrument. 
Furthermore, her patience (or her lifetime) will run out, because 
the time on her clock approaches infinity as she waits to get signals 
from matter that is approaching the horizon. This is all exactly as it 
should be, since the horizon is by definition the boundary of her ob- 
servable universe. (A light ray emitted from the horizon will end up 
at i+, which is an end-point of timelike world-lines reached only by 
observers who have experienced an infinite amount of proper time.) 


People who are bothered by these issues often acknowledge the 
external unobservability of matter passing through the horizon, and 
then want to pass from this to questions like, “Does that mean the 
black hole never really forms?” This presupposes that our distant 
observer has a uniquely defined notion of simultaneity that applies 
to a region of space stretching from her own position to the interior 
of the black hole, so that she can say what’s going on inside the black 
hole “now.” But the notion of simultaneity in general relativity is 
even more limited than its counterpart in special relativity. Not 
only is simultaneity in general relativity observer-dependent, as in 
special relativity, but it is also local rather than global. 


In figure e, E is an event on the world-line of an observer. The 
spacelike surface S; is one possible “now” for this observer. Accord- 
ing to this surface, no particle has ever fallen in and reached the 
horizon; every such particle has a world-line that intersects 5;, and 
therefore it’s still on its way in. 


S2 is another possible “now” for the same observer at the same 
time. According to this definition of “now,” all the particles have 
passed the event horizon, but none have hit the singularity yet. 
Finally, 53 is a “now” according to which all the particles have hit 
the singularity. 


If this was special relativity, then we could decide which surface 
was the correct notion of simultaneity for the observer, based on 
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the observer’s state of motion. But in general relativity, this only 
works locally (which is why I made all three surfaces coincide near 
E). There is no well-defined way of deciding which is the correct way 
of globally extending this notion of simultaneity. 


Although it may seem strange that we can’t say whether the 
singularity has “already” formed according to a distant observer, 
this is really just an inevitable result of the fact that the singularity 
is spacelike. The same thing happens in the case of a Schwarzschild 
spacetime, which we think of as a description of an eternal black 
hole, i.e., one that has always existed and always will. On the 
similar Penrose diagram for an eternal black hole, we can still draw 
a spacelike surface like S; or S2, representing a definition of “now” 
such that the singularity doesn’t exist yet. 


7.3.4 Penrose diagrams in general 


Ideally we would like to generalize the procedure for drawing 
Penrose diagrams so that we would be able to uniquely determine 
one for any spacetime. This turns out to be not so clear-cut. The 
procedure would go something like this: 


1. Make an n-dimensional section or projection, where usually, 
but not always, n = 2. 


2. Do a transformation to reduce the resulting manifold to a flat 
one of finite size. 


3. Adjoin idealized surfaces and points at infinity. 


At step 1, we want to take advantage of any symmetries, such as 
rotational symmetry, so that the final result will be informative, be 
representative of the whole spacetime, and accurately depict causal 
relationships in the original spacetime. If the original spacetime has 
a low degree of symmetry (e.g., a spacetime containing three black 
holes arranged in a triangle), then this might require n > 2. At this 
step we also need to make sure that lightlike geodesics in the original 
space correspond properly to lightlike geodesics in the submanifold. 


For step 2, we have already given a geometrical characterization 
of the type of transformation we have in mind, which is called a 
conformal transformation. It turns out to be possible to encapsulate 
this idea in a simple analytic way. Given a spacetime with a metric 
g, we define a fictitious metric g = 22g, where ( is a nonzero real 
number that varies from point to point. (Cf. sec. 5.11, p. 202, where 
Q was constant.) The idea here is that g and g agree on where the 
light cone is, but they disagree on the measurement of distances and 
times. The same manifold equipped with the fictitious metric g is 
the one being drawn on the page when we make a Penrose diagram. 
We let 2 — 0 as we approach the idealized boundary regions like 
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i° and .Y+, and this is what causes the Penrose diagram to take up 
finite space on the page. 


It is not possible in general to do what is required in step 2 
by making a conformal transformation to change a manifold into 
a flat one. A manifold that can be flattened in this way is called 
conformally flat. All two-dimensional manifolds are conformally flat, 
so in the n = 2 case this is guaranteed. For n > 3 we will usually 
not have conformal flatness if there are gravitational waves or tidal 
forces present. 


The most problematic part, surprisingly, is step 3. This topic 
goes under the general heading of “boundary constructions.” Re- 
views are available on this topic.* There are a number of more or less 
specific techniques for constructing a boundary, with an alphabet 
soup of names including the g-boundary, c-boundary, b-boundary, 
and a-boundary. As someone who is not a specialist in this subfield, 
the impression I get is that this is an area of research that has turned 
out badly and has never produced any useful results, but work con- 
tinues, and it is possible that at some point the smoke will clear. As 
a simple example of what one would like to get, but doesn’t get, from 
these studies, it would seem natural to ask how many dimensions 
there are in a black hole singularity. (See p. 247 for a discussion 
of why this is a nontrivial question.) Different answers come back 
from the different methods. For example, the b-boundary approach 
says that both black-hole and cosmological singularities are zero- 
dimensional points, while in the c-boundary method (which was de- 
signed to harmonize with Penrose diagrams) they are three-surfaces 
(as one would imagine from the Penrose diagrams). 


7.3.5 Global hyperbolicity 


Causality refers to our vaguely defined feeling that the world 
should have an orderly progression of cause and effect. Making 
this notion more precise is surprisingly difficult. Penrose diagrams, 
and their associated concepts, are essentially representations of the 
causal structure of spacetime, and these turn out to be helpful 
in putting together one of the more satisfying attempts to define 
causality. This definition is called global hyperbolicity. The ob- 
scure terminology is related to the classification of partial differential 
equations. 


Some definitions are required as a preliminary. Consider a set 
5 of events in spacetime. S is bounded if it does not include any of 
the idealized points on the Penrose diagram that we have added at 


“Ashley, “Singularity theorems and the abstract boundary construc- 
tion,” https: //digitalcollections.anu.edu.au/handle/1885/46055. Garcia- 
Parrado and Senovilla, “Causal structures and causal boundaries,” http:// 
arxiv.org/abs/gr-qc/0501069. 
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infinity.° S is closed if it contains its own boundary.® S is compact 
if it is closed and bounded. 


Compact and noncompact light cones Example: 11 
Figure f shows a spacetime containing a black hole that forms 
by gravitational collapse. Point P is inside the event horizon, Q 
outside. Consider the following four point-sets: 


I*(P), called the chronological future of P, is the interior of P’s 
future-directed light cone. 


J*(P) is like /*(P), but also includes events that are on the bound- 
ary of the light cone, i.e., events that cannot be connected to P 
by a timelike curve but that can be connected to it by a lightlike 
curve. We call this the causal future of P, since it is the set of 
events that could be caused by P. 


I*(Q) and J*(Q) are the analogous sets built on Q. 


Of these four sets, only J*(P) is compact. /*(P) is noncompact be- 
cause it is not closed. /*(Q) and J*+(Q) are noncompact because 
they are not bounded; they include idealized points at infinity that 
lie in #* and f+. 
In addition to the notation introduced in example 11, we will 
need the similar notations J~ and J~ for the corresponding past 
light-cones. 


Definition: A spacetime is globally hyperbolic if: (1) there are 
no closed, timelike curves (CTCs),’ and (2) given any two events P 
and Q, the intersection of J+(P) and J~(Q) is compact. (Condition 
2 is required only when P and Q are points in the manifold, not 
boundary points.) 


In a globally hyperbolic spacetime, initial-value problems always 
have unique solutions. That is, we can pick a spacelike surface and 
give the value of a wave on that surface, and the wave equation 
will then have a unique solution. Such a surface is called a Cauchy 
surface. 


We can readily verify by inspection of the Penrose digrams that 
the spacetimes described earlier in this section are globally hyper- 
bolic. Condition 2 implies that the intersection doesn’t contain any 
singularities or points at infinity. Although black hole spacetimes 


>More rigorously, this is equivalent to saying that for any geodesic in S, there 
is a bound on the affine parameter. 

°To make this more precise, we proceed as described in section 5.10.6, p. 201 
by enlarging the set of points in our spacetime manifold M to include points at 
infinitesimal distances from one of the original points. Then S is closed if, for 
any point in the enlarged version of S, there is a point lying at an infinitesimal 
distance from it in the original version of S. 

"For precision, the condition needs to be made a little stronger. We want no 
closed, non-spacelike curves, and we also want it to be impossible for curves to 
exist that are arbitrarily close to being such curves, in the sense that for any 
event, there exists a neighborhood around it that can never be revisited. 
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do contain singularities, the spacelike nature of these singularities 
implies that they can never lie in the intersection of light cones as 
referred to in the definition. Therefore such spacetimes are globally 
hyperbolic. 


Global hyperbolicity Example: 12 
Figure g/1 shows a piece cut out of Minkowski space. The dashed 
outline is meant to indicate that the piece doesn’t include its bound- 
ary. This spacetime is not globally hyperbolic. For certain choices 
of events P and Q, the intersection J+(P) M J~(Q) could extend 
out to the cut at the edge. Since the spacetime doesn’t include 
its boundary, this intersection would not be compact. It’s easy to 
see why causality fails in this spacetime. If we pick a spacelike 
surface near the bottom of the diagram, it would only cut through 
a small part of the bottom of the spacetime. At later times, the 
spacetime grows at a rate that is greater than c. Therefore such 
a surface cannot be used as a Cauchy surface; given the initial 
conditions on this surface, we cannot predict what will happen in 
the parts of the universe that are outside its causal future. 


In g/2 we have the same example, but now the boundary is in- 
cluded. This set is not a manifold, which excludes it from consid- 
eration as a spacetime in general relativity. 


N 


Figure g/3 is a picture of Minkowski space with a timelike singu- 
larity in it (dashed line). Singularities are not point-sets in the 
manifold, so topologically, this is like Minkowski space with a sin- 
gle timelike curve surgically removed. Global hyperbolicity fails 
because the intersection J*+(P) MJ~ (Q) could surround the singu- 
larity, and would not be compact because it would not include its 
boundary at the singularity. This violation of global hyperbolicity 
indicates a failure of causality in such a spacetime (see p. 242). 


By cutting off the lower half of the diamond representing Minkowski 
spacetime, we obtain figure g/4. The dashed line indicates that 
the boundary is not included, and therefore this is a manifold. It 
is also globally hyperbolic. This example suggests that global hy- 
perbolicity does not necessarily capture everything we might ever 
want to describe in a definition of causality. If a paleontologist liv- 
ing in this spacetime finds a dinosaur fossil embedded in a rock, 
she will naturally infer that a dinosaur lived at some point in the 
past, causing the fossil to exist. But perhaps this is not the case 
— the hypothetical dinosaur might be one that would have ex- 
isted before the boundary. This creationism-flavored violation of 
causality is of a different flavor than the situation we would have 
had if the bottom edge of the diagram had been a big bang sin- 
gularity; in that case, we would have had a knowable reason why 
chains of cause and effect could not be extended back into the 
past beyond a certain time. 


4 


g/ Example 11. 
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7.4 Static and stationary spacetimes 


7.4.1 Stationary spacetimes 


When we set out to describe a generic spacetime, the Alice 
in Wonderland quality of the experience is partly because coordi- 
nate invariance allows our time and distance scales to be arbitrarily 
rescaled, but also partly because the landscape can change from one 
moment to the next. The situation is drastically simplified when the 
spacetime has a timelike Killing vector. Such a spacetime is said to 
be stationary. Two examples are flat spacetime and the spacetime 
surrounding the rotating earth (in which there is a frame-dragging 
effect). Non-examples include the solar system, cosmological mod- 
els, gravitational waves, and a cloud of matter undergoing gravita- 
tional collapse. 


Can Alice determine, by traveling around her spacetime and 
carrying out observations, whether it is stationary? If it’s not, then 
she might be able to prove it. For example, suppose she visits a 
certain region and finds that the Kretchmann invariant RCE R pod 
varies with time in her frame of reference. Maybe this is because 
an asteroid is coming her way, in which case she could readjust her 
velocity vector to match that of the asteroid. Even if she can’t see 
the asteroid, she can still try to find a velocity that makes her local 
geometry stop changing in this particular way. If the spacetime is 
truly stationary, then she can always “tune in” to the right velocity 
vector in this way by searching systematically. If this procedure ever 
fails, then she has proved that her spacetime is not stationary. 


Self-check: Why is the timelike nature of the Killing vector im- 
portant in this story? 


Proving that a spacetime is stationary is harder. This is partly 
just because spacetime is infinite, so it will take an infinite amount 
of time to check everywhere. We aren’t inclined to worry too much 
about this limitation on our geometrical knowledge, which is of a 
type that has been familiar since thousands of years ago, when it 
upset the ancient Greeks that the parallel postulate could only be 
checked by following lines out to an infinite distance. But there is a 
new type of limitation as well. The Schwarzschild spacetime is not 
stationary according to our definition. In the coordinates used in 
section 6.2, O; is a Killing vector, but is only timelike for r > 2m; for 
r < 2m it is spacelike. Although the solution describes a black hole 
that is going to sit around forever without changing, no observer 
can ever verify that fact, because once she strays inside the horizon 
she must follow a timelike world-line, which will end her program of 
observation within some finite time. 


7.4.2 Isolated systems 
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Asymptotic flatness 


This unfortunate feature of our definition of stationarity — its 
empirical unverifiability — is something that in general we just have 
to live with. But there is an alternative in the special case of an 
isolated system, such as our galaxy or a black hole. It may be a 
good approximation to ignore distant matter, modeling such a sys- 
tem with a spacetime that is asymptotically flat. The notion of 
asymptotic flatness was introduced informally on p. 149. Formu- 
lating the definition of this term rigorously and in a coordinate- 
invariant way involves a large amount of technical machinery, since 
we are not guaranteed to be presented in advance with a special, 
physically significant set of coordinates that would lead directly 
to a quantitative way of defining words like “nearby.” The essen- 
tial idea is that a spacetime is asymptotically flat if it is possi- 
ble to perform a conformal transformation in such a way that the 
result, has idealized regions at infinity 1°, 7+, and .4~ (but not 
it and i—) that look like those of Minkowski space. The reader 
who wants to see the full machinery presented can find presen- 
tations in various places, such as Hawking and Ellis, ch. 11 of 
Wald, or the open-access review article “Conformal Infinity” at 
link.springer.com/article/10.12942/1rr-2004-1. 


Asymptotically stationary spacetimes 


In the case of an asymptotically flat spacetime, we say that it is 
also asymoptotically stationary if it has a Killing vector that be- 
comes timelike far away. Some authors (e.g., Ludvigsen) define 
“stationary” to mean what I’m calling “asymoptotically stationary,” 
others (Hawking and Ellis) define it the same way I do, and still oth- 
ers (Carroll) are not self-consistent. The Schwarzschild spacetime is 
asymptotically stationary, but not stationary. 


7.4.3 A stationary field with no other symmetries 


Consider the most general stationary case, in which the only 
Killing vector is the timelike one. The only ambiguity in the choice 
of this vector is a rescaling; its direction is fixed. At any given point 
in space, we therefore have a notion of being at rest, which is to 
have a velocity vector parallel to the Killing vector. An observer at 
rest detects no time-dependence in quantities such as tidal forces. 


Points in space thus have a permanent identity. The gravita- 
tional field, which the equivalence principle tells us is normally an 
elusive, frame-dependent concept, now becomes more concrete: it is 
the proper acceleration required in order to stay in one place. We 
can therefore use phrases like “a stationary field,” without the usual 
caveats about the coordinate-dependent meaning of “field.” 


Space can be sprinkled with identical clocks, all at rest. Fur- 
thermore, we can compare the rates of these clocks, and even com- 
pensate for mismatched rates, by the following procedure. Since the 
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spacetime is stationary, experiments are reproducible. If we send a 
photon or a material particle from a point A in space to a point B, 
then identical particles emitted at later times will follow identical 
trajectories. The time lag between the arrival of two such particls 
tells an observer at B the amount of time at B that corresponds to 
a certain interval at A. If we wish, we can adjust all the clocks so 
that their rates are matched. An example of such rate-matching is 
the GPS satellite system, in which the satellites’ clocks are tuned 
to 10.22999999543 MHz, matching the ground-based clocks at 10.23 
MHz. (Strictly speaking, this example is out of place in this subsec- 
tion, since the earth’s field has an additional azimuthal symmetry.) 


It is tempting to conclude that this type of spacetime comes 
equipped with a naturally preferred time coordinate that is unique 
up to a global affine transformation t > at+b. But to construct such 
a time coordinate, we would have to match not just the rates of the 
clocks, but also their phases. The best method relativity allows for 
doing this is Einstein synchronization (p. ??), which involves trading 
a photon back and forth between clocks A and B and adjusting 
the clocks so that they agree that each clock gets the photon at 
the mid-point in time between its arrivals at the other clock. The 
trouble is that for a general stationary spacetime, this procedure is 
not transitive: synchronization of A with B, and of B with C, does 
not guarantee agreement between A with C. This is because the 
time it takes a photon to travel clockwise around triangle ABCA 
may be different from the time it takes for the counterclockwise 
itinerary ACBA. In other words, we may have a Sagnac effect, which 
is generally interpreted as a sign of rotation. Such an effect will 
occur, for example, in the field of the rotating earth, and it cannot 
be eliminated by choosing a frame that rotates along with the earth, 
because the surrounding space experiences a frame-dragging effect, 
which falls off gradually with distance. 


Although a stationary spacetime does not have a uniquely pre- 
ferred time, it does prefer some time coordinates over others. In a 
stationary spacetime, it is always possible to find a “nice” t such 
that the metric can be expressed without any t-dependence in its 
components. 


7.4.4 A stationary field with additional symmetries 


Most of the results given above for a stationary field with no 
other symmetries also hold in the special case where additional sym- 
metries are present. The main difference is that we can make linear 
combinations of a particular timelike Killing vector with the other 
Killing vectors, so the timelike Killing vector is not unique. This 
means that there is no preferred notion of being at rest. For exam- 
ple, in a flat spacetime we cannot define an observer to be at rest if 
she observes no change in the local observables over time, because 
that is true for any inertial observer. Since there is no preferred rest 
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frame, we can’t define the gravitational field in terms of that frame, 
and there is no longer any preferred definition of the gravitational 
field. 


7.4.5 Static spacetimes 


In addition to synchronizing all clocks to the same frequency, we 
might also like to be able to match all their phases using Einstein 
synchronization, which requires transitivity. Transitivity is frame- 
dependent. For example, flat spacetime allows transitivity if we use 
the usual coordinates. However, if we change into a rotating frame of 
reference, transitivity fails (see p. 109). If coordinates exist in which 
a particular spacetime has transitivity, then that spacetime is said 
to be static. In these coordinates, the metric is diagonalized, and 
since there are no space-time cross-terms like dxdt in the metric, 
such a spacetime is invariant under time reversal. Roughly speaking, 
a static spacetime is one in which there is no rotation. 


7.4.6 Birkhoff’s theorem 


Birkhoff’s theorem, proved below, states that in the case of 
spherical symmetry, the vacuum field equations have a solution, 
the Schwarzschild spacetime, which is unique up to a choice of co- 
ordinates and the value of m. Let’s enumerate the assumptions 
that went into our derivation of the Schwarzschild metric on p. 222. 
These were: (1) the vacuum field equations, (2) spherical symmetry, 
(3) asymptotic staticity, (4) a certain choice of coordinates, and (5) 
A =0. Birkhoff’s theorem says that the assumption of staticity was 
not necessary. That is, even if the mass distribution contracts and 
expands over time, the exterior solution is still the Schwarzschild 
solution. Birkhoff’s theorem holds because gravitational waves are 
transverse, not longitudinal (see p. 378), so the mass distribution’s 
radial throbbing cannot generate a gravitational wave. 


Proof of Birkhoff’s theorem: Spherical symmetry guarantees 
that we can introduce coordinates r and t such that the surfaces 
of constant r and t have the structure of a sphere with radius r. On 
one such surface we can introduce colatitude and longitude coordi- 
nates 6 and ¢. The (0, ¢) coordinates can be extended in a natural 
way to other values of r by choosing the radial lines to lie in the 
direction of the covariant derivative vector® Var, and this ensures 
that the metric will not have any nonvanishing terms in dr dé or 
dr d@¢, which could only arise if our choice had broken the symme- 


SIt may seem backwards to start talking about the covariant derivative of 
a particular coordinate before a complete coordinate system has even been in- 
troduced. But (excluding the trivial case of a flat spacetime), 7 is not just an 
arbitrary coordinate, it is something that an observer at a certain point in space- 
time can determine by mapping out a surface of geometrically identical points, 
and then determining that surface’s radius of curvature. Another worry is that it 
is possible for Var to misbehave on certain surfaces, such as the event horizon of 
the Schwarzschild spacetime, but we can simply require that radial lines remain 
continuous as they pass through these surfaces. 
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try between positive and negative values of d@ and d@. Just as we 
were free to choose any way of threading lines of constant (0, ¢, t) 
between spheres of different radii, we can also choose how to thread 
lines of constant (0,¢,r) between different times, and this can be 
done so as to keep the metric free of any time-space cross-terms such 
as dO dt. The metric can therefore be written in the form? 


ds? = h(t, r) dt? — k(t, r) dr? — r?(d6? + sin? 6 d¢”). 


This has to be a solution of the vacuum field equations, Ray = 0, 
and in particular a quick calculation with Maxima shows that R,; = 
—0,k/k?r, so k must be independent of time. With this restriction, 
we find R,, = —0,h/hkr — 1/r? — 1/kr? = 0, and since k is time- 
independent, 0,h/h is also time-independent. This means that for 
a particular time ty, the function f(r) = h(t.,r) has some universal 
shape set by a differential equation, with the only possible ambiguity 
being an over-all scaling that depends on t,. But since h is the time- 
time component of the metric, this scaling corresponds physically to 
a situation in which every clock, all over the universe, speeds up and 
slows down in unison. General relativity is coordinate-independent, 
so this has no observable effects, and we can absorb it into a re- 
definition of ¢ that will cause h to be time-independent. Thus the 
metric can be expressed in the time-independent diagonal form 


ds” = h(r) dt? — k(r) dr? — r?(d6? + sin? 6.d¢?). 


We have already solved the field equations for a metric of this form 
and found as a solution the Schwarzschild spacetime.!? Since the 
metric’s components are all independent of t, ® is a Killing vec- 
tor, and it is timelike for large r, so the Schwarzschild spacetime is 
asymptotically static. 


7.4.7 No-hair theorems 


Birkhoff’s theorem is similar to a set of theorems called no-hair 
theorems describing black holes. The most general no-hair theo- 
rem states that a black hole is completely characterized by its mass, 
charge, and angular momentum. Other than these three numbers, 
nobody on the outside can recover any information that was pos- 
sessed by the matter and energy that were sucked into the black 
hole. 


It has been proposed!! that the no-hair theorem for nonzero an- 
gular momentum and zero charge could be tested empirically by 
observations of Sagittarius A*. If the observations are consistent 


°On the same surfaces referred to in the preceding footnote, the functions h 
and k may to go to 0 or oo. These turn out to be nothing more serious than 
coordinate singularities. 

The Schwarzschild spacetime is the uniquely defined geometry found by re- 
moving the coordinate singularities from this form of the Schwarzschild metric. 
"Johannsen and Psaltis, http://arxiv.org/abs/1008.3902v1 
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with the no-hair theorem, it would be taken as supporting the va- 
lidity of general relativity and the interpretation of this object as a 
supermassive black hole. If not, then there are various possibilities, 
including a failure of general relativity to be the correct theory of 
strong gravitational fields, or a failure of one of the theorem’s other 
assumptions, such as the nonexistence of closed timelike curves in 
the surrounding universe. 


The no-hair theorems say that relativity only has a small reper- 
toire of types of black-holes, defined as regions of space that are 
causally disconnected from the universe, in the sense that future 
light-cones of points in the region do not extend to infinity.!? That 
is, a black hole is defined as a region hidden behind an event hori- 
zon, and since the definition of an event horizon is dependent on the 
observer, we specify an observer infinitely far away. Birkhoff’s the- 
orem has a somewhat different structure than those of the no-hair 
theorems, since it assumes a symmetry and proves the existence of 
an event horizon (if the vacuum region is extended to small enough 
radii), whereas the no-hair theorems assume an event horizon and 
prove the form of the metric, including its symmetries. 


The no-hair theorems cannot classify naked singularities, i.e., 
those not hidden behind horizons. The role of naked singularities in 
relativity is the subject of the cosmic censorship hypothesis, which is 
an open problem. The theorems do not rule out the Big Bang singu- 
larity, because we cannot define the notion of an observer infinitely 
far from the Big Bang. We can also see that Birkhoff’s theorem 
does not prohibit the Big Bang, because cosmological models are 
not vacuum solutions with A = 0. Black string solutions are not 
ruled out by Birkhoff’s theorem because they would lack spherical 
symmetry, so we need the arguments given on p. 252 to show that 
they don’t exist. 


We saw on pp. 247 and 275 that there is no clearly defined way 
to treat a singularity as a geometrical object, and that this am- 
biguity extends even to such seemingly straightforward questions 
as how many dimensions it has. Geometrically, as Gertrude Stein 
said about Oakland, there’s “no there there.” We could also ask 
whether a black hole singularity has any physical properties. If so, 
then the no-hair theorems would limit the list of such properties to 


'For a more formal statement of this, see Hawking and Ellis, “The Large Scale 
Structure of Space-Time,” p. 315. Essentially, the region must be a connected 
region on a spacelike three-surface, and there must be no lightlike world-lines 
that connect points in that region to null infinity. Null infinity was introduced 
briefly on p. 272 is defined formally using conformal techniques, but basically 
refers to points that are infinitely far away in both space and time, and have 
the two infinities equal in a certain sense, so that a free light ray could end up 
there. The definition is based on the assumption that the surrounding spacetime 
is asymptotically flat, since otherwise null infinity can’t be defined. It is not 
actually necessary to assume a singularity as part of the definition; the no-hair 
theorems guarantee that one exists. 
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at most three. But we cannot ascribe these properties to the singu- 
larity itself. Rather, they are properties of some large region of the 
spacetime, measurable by an observer at asymptotic infinity. Such 
an observer cannot say whether a black hole’s mass is a property of 
the singularity; she cannot even say whether the singularity exists 
“now.” In this sense a black hole singularity is not an “it.” Asking 
about “its” properties is like asking what time it is when the tip of 
the minute hand is at the center of the clock. The dial only exists 
around the circumference of the circle, not at its center. 


7.4.8 The gravitational potential 


When Pound and Rebka made the first observation of gravita- 
tional redshifts, these shifts were interpreted as evidence of gravita- 
tional time dilation, i.e., a mismatch in the rates of clocks. We are 
accustomed to connecting these two ideas by using the expression 
e4® for the ratio of the rates of two clocks (example 11, p. 58), 
where © is a function of the spatial coordinates, and this is in fact 
the most general possible definition of a gravitational potential ® 
in relativity. Since a stationary field allows us to compare rates 
of clocks, it seems that we should be able to define a gravitational 
potential for any stationary field. There is a problem, however, be- 
cause when we talk about a potential, we normally have in mind 
something that has encoded within it all there is to know about the 
field. We would therefore expect to be able to find the metric from 
the potential. But the example of the rotating earth shows that this 
need not be the case for a general stationary field. In that example, 
there are effects like frame-dragging that clearly cannot be deduced 
from ®; for by symmetry, ® is independent of azimuthal angle, and 
therefore it cannot distinguish between the direction of rotation and 
the contrary direction. In a static spacetime, these rotational effects 
don’t exist; a static vacuum spacetime can be described completely 
in terms of a single scalar potential plus information about the spa- 
tial curvature. 


There are two main reasons why relativity does not offer a grav- 
itational potential with the same general utility as its Newtonian 
counterpart. 


The Einstein field equations are nonlinear. Therefore one can- 
not, in general, find the field created by a given set of sources by 
adding up the potentials. At best this is a possible weak-field ap- 
proximation. In particular, although Birkhoff’s theorem is in some 
ways analogous to the Newtonian shell theorem, it cannot be used 
to find the metric of an arbitrary spherically symmetric mass distri- 
bution by breaking it up into spherical shells. 


It is also not meaningful to talk about any kind of gravitational 
potential for spacetimes that aren’t static or stationary. For exam- 
ple, consider a cosmological model describing our expanding uni- 
verse. Such models are usually constructed according to the Coper- 
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nican principle that no position in the universe occupies a privileged 
place. In other words, they are homogeneous in the sense that they 
have Killing vectors describing arbitrary translations and rotations. 
Because of this high degree of symmetry, a gravitational potential 
for such a model would have to be independent of position, and then 
it clearly could not encode any information about the spatial part of 
the metric. Even if we were willing to make the potential a function 
of time, ®(t), the results would still be nonsense. The gravitational 
potential is defined in terms of rate-matching of clocks, so a poten- 
tial that was purely a function of time would describe a situation 
in which all clocks, everywhere in the universe, were changing their 
rates in a uniform way. But this is clearly just equivalent to a redefi- 
nition of the time coordinate, which has no observable consequences 
because general relativity is coordinate-invariant. A corollary is that 
in a cosmological spacetime, it is not possible to give a natural pre- 
scription for deciding whether a particular redshift is gravitational 
(measured by ®) or kinematic, or some combination of the two (see 
also p. 339). 


7.5 The uniform gravitational field revisited 


This section gives a somewhat exotic example. It is not necessary 
to read it in order to understand the later material. 


In problem 7 on page 209, we made a wish list of desired proper- 
ties for a uniform gravitational field, and found that they could not 
all be satisfied at once. That is, there is no global solution to the 
Einstein field equations that uniquely and satisfactorily embodies 
all of our Newtonian ideas about a uniform field. We now revisit 
this question in the light of our new knowledge. 


The 1+1-dimensional metric 
ds? = e797 dt? — dz? 


is the one that uniquely satisfies our expectations based on the 
equivalence principle (example 11, p. 58), and it is a vacuum so- 
lution. We might logically try to generalize this to 3+1 dimensions 
as follows: 
ds? = e79? dt? — dx? — dy? — dz?. 

But a funny thing happens now — simply by slapping on the two 
new Cartesian axes x and y, it turns out that we have made our 
vacuum solution into a non-vacuum solution, and not only that, 
but the resulting stress-energy tensor is unphysical (ch. 8, problem 
8, p. 368). 


One way to proceed would be to relax our insistence on making 
the spacetime one that exactly embodies the equivalence principle’s 
requirements for a uniform field.!? This can be done by taking gi = 


13Thanks to physicsforums.com user Mentz114 for suggesting this approach 
and demonstrating the following calculation. 
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e?? where ® is not necessarily equal to 2gz. By requiring that the 
metric be a 3+1 vacuum solution, we arrive at a differential equation 
whose solution is ® = In(z + k1) + ke, which recovers the flat-space 
metric that we found in example 19 on page 140 by applying a 
change of coordinates to the Lorentz metric. 


What if we want to carry out the generalization from 1+1 to 3+1 
without violating the equivalence principle? For physical motivation 
in how to get past this obstacle, consider the following argument 
made by Born in 1920.'4 Take a frame of reference tied to a rotat- 
ing disk, as in the example from which Einstein originally took much 
of the motivation for creating a geometrical theory of gravity (sub- 
section 3.5.4, p. 109). Clocks near the edge of the disk run slowly, 
and by the equivalence principle, an observer on the disk interprets 
this as a gravitational time dilation. But this is not the only rela- 
tivistic effect seen by such an observer. Her rulers are also Lorentz 
contracted as seen by a non-rotating observer, and she interprets 
this as evidence of a non-Euclidean spatial geometry. There are 
some physical differences between the rotating disk and our default 
conception of a uniform field, specifically in the question of whether 
the metric should be static (i.e., lacking in cross-terms between the 
space and time variables). But even so, these considerations make 
it natural to hypothesize that the correct 3+1-dimensional metric 
should have transverse spatial coefficients that decrease with height. 


With this motivation, let’s consider a metric of the form 
ds? = e2? dt? = e292 dx? _ e 2kz dy? -_ dz”, 


where j and k are constants, and I’ve taken g = 1 for convenience.!° 


The following Maxima code calculates the scalar curvature and the 
Einstein tensor: 


load(ctensor) ; 

et-coordsé [t,x 7521; 

lg:matrix([exp(2*z) ,0,0,0], 
[0 ,-exp(-2*j*z) ,0,0], 
[0,0,-exp(-2*k*z) ,0], 
[0,0,0,-1] 

ie 

cmetric(); 

scurvature(); 

leinstein(true) ; 


The output from line 9 shows that the scalar curvature is constant, 
which is a necessary condition for any spacetime that we want to 


“Max Born, Einstein’s Theory of Relativity, 1920. In the 1962 Dover edition, 
the relevant passage is on p. 320 

' A metric of this general form is referred to as a Kasner metric. One usually 
sees it written with a logarithmic change of variables, so that z appears in the 
base rather than in the exponent. 
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think of as representing a uniform field. Inspecting the Einstein 
tensor output by line 10, we find that in order to get Gre and Gyy 
to vanish, we need j and k to be (1+ V3i)/2. By trial and error, we 
find that assigning the complex-conjugate values to 7 and k makes 
Gy and G,, vanish as well, so that we have a vacuum solution. 
This solution is, unfortunately, complex, so it is not of any obvious 
value as a physically meaningful result. Since the field equations 
are nonlinear, we can’t use the usual trick of forming real-valued 
superpositions of the complex solutions. We could try simply tak- 
ing the real part of the metric. This gives g,, = e~* cos V/3z and 
Gyy =e * sin /3z, and is unsatisfactory because the metric becomes 
degenerate (has a zero determinant) at z = nz /2\/3, where n is an 
integer. 


It turns out, however, that there is a very similar solution, found 
by Petrov in 1962,!° that is real-valued. The Petrov metric, which 
describes a spacetime with cylindrical symmetry, is: 


ds? = — dr? —e7*" dz? +e" [2 sin V/3r dd dt — cos V3r(d¢? = dt?) 


Note that it has many features in common with the complex oscilla- 
tory solution we found above. There are transverse length contrac- 
tions that decay and oscillate in exactly the same way. The presence 
of the d¢édt term tells us that this is a non-static, rotating solution 
— exactly like the one that Einstein and Born had in mind in their 
prototypical example! We typically obtain this type of effect due 
to frame dragging by some rotating massive body (see p. 149), and 
the Petrov solution can indeed be interpreted as the spacetime that 
exists in the vacuum on the exterior of an infinite, rigidly rotating 
cylinder of “dust” (see p. 132). 


The complicated Petrov metric might seem like the furthest pos- 
sible thing from a uniform gravitational field, but in fact it is about 
the closest thing general relativity provides to such a field. We 
first note that the metric has Killing vectors Oz, 04, and O,, so it 
has at least three out of the four translation symmetries we ex- 
pect from a uniform field. By analogy with electromagnetism, we 
would expect this symmetry to be absent in the radial direction, 
since by Gauss’s law the electric field of a line of charge falls off 
like 1/r. But surprisingly, the Petrov metric is also uniform ra- 
dially. It is possible to give the fourth killing vector explicitly (it 
is Op + 202 + (1/2) (V3t — 6)O5 — (1/2)(V3¢ + t)O;), but it is per- 
haps more transparent to check that it represents a field of constant 
strength (problem 5, p. 290). 


For insight into this surprising result, recall that in our attempt 
at constructing the Cartesian version of this metric, we ran into the 


Petrov, in Recent Developments in General Relativity, 1962, Pergamon, p. 
383. For a presentation that is freely accessible online, see Gibbons and Gielen, 
“The Petrov and Kaigorodov-Ozsvath Solutions: Spacetime as a Group Mani- 
fold,” arxiv.org/abs/0802.4082. 
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problem that the metric became degenerate at z = na/2V/3. The 
presence of the dé dt term prevents this from happening in Petrov’s 
cylindrical version; two of the metric’s diagonal components can 
vanish at certain values of r, but the presence of the off-diagonal 
component prevents the determinant from going to zero. (The de- 
terminant is in fact equal to —1 everywhere.) What is happening 
physically is that although the labeling of the @ and t coordinates 
suggests a time and an azimuthal angle, these two coordinates are 
in fact treated completely symmetrically. At values of r where the 
cosine factor equals 1, the metric is diagonal, and has signature 
(t,¢,r,z) = (+,-,—,—), but when the cosine equals —1, this be- 
comes (—,+,—,—), so that ¢ is now the timelike coordinate. This 
perfect symmetry between ¢ and ¢ is an extreme example of frame- 
dragging, and is produced because of the specially chosen rate of 
rotation of the dust cylinder, such that the velocity of the dust at 
the outer surface is exactly c (or approaches it). 


Classically, we would expect that a test particle released close 
enough to the cylinder would be pulled in by the gravitational at- 
traction and destroyed on impact, while a particle released farther 
away would fly off due to the centrifugal force, escaping and even- 
tually approaching a constant velocity. Neither of these would be 
anything like the experience of a test particle released in a uniform 
field. But consider a particle released at rest in the rotating frame 
at a radius r; for which cos /3r, = 1, so that t is the timelike co- 
ordinate. The particle accelerates (let’s say outward), but at some 
point it arrives at an rg where the cosine equals zero, and the ¢ — t 
part of the metric is purely of the form dg¢dt. At this location, we 
can define local coordinates u = ¢—t and v = ¢+t, so that the 
metric depends only on du? — dv”. One of the coordinates, say u, is 
now the timelike one. Since our particle is material, its world-line 
must be timelike, so it is swept along in the —¢ direction. Gibbons 
and Gielen show that the particle will now come back inward, and 
continue forever by oscillating back and forth between two radii at 
which the cosine vanishes. 


7.5.1 Closed timelike curves 


This oscillation still doesn’t sound like the motion of a particle 
in a uniform field, but another strange thing happens, as we can 
see by taking another look at the values of r at which the cosine 
vanishes. At such a value of r, construct a curve of the form (t = 
constant, r = constant, ¢, z = constant). This is a closed curve, and 
its proper length is zero, i.e., it is lightlike. This violates causality. 
A photon could travel around this path and arrive at its starting 
point at the same time when it was emitted. Something similarly 
weird hapens to the test particle described above: whereas it seems 
to fall sometimes up and sometimes down, in fact it is always falling 
down — but sometimes it achieves this by falling up while moving 
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backward in time! 


Although the Petov metric violates causality, Gibbons and Gie- 
len have shown that it satisfies the chronology protection conjecture: 
“In the context of causality violation we have shown that one cannot 
create CTCs [closed timelike curves] by spinning up a cylinder be- 
yond its critical angular velocity by shooting in particles on timelike 
or null curves.” 


We have an exact vacuum solution to the Einstein field equations 
that violates causality. This raises troublesome questions about the 
logical self-consistency of general relativity. A very readable and 
entertaining overview of these issues is given in the final chapter of 
Kip Thorne’s Black Holes and Time Warps: Einstein’s Outrageous 
Legacy. In a toy model constructed by Thorne’s students, involv- 
ing a billiard ball and a wormhole, it turned out that there always 
seemed to be self-consistent solutions to the ball’s equations of mo- 
tion, but they were not unique, and they often involved disquieting 
possibilities in which the ball went back in time and collided with its 
earlier self. Among other things, this seems to lead to a violation of 
conservation of mass-energy, since no mass was put into the system 
to create extra copies of the ball. This would then be an example 
of the fact that, as discussed in section 4.5.1, general relativity does 
not admit global conservation laws. However, there is also an ar- 
gument that the mouths of the wormhole change in mass in such a 
way as to preserve conservation of energy." 


‘http: //golem.ph.utexas.edu/string/archives/000550.htm1 


Section 7.5 The uniform gravitational field revisited 


289 


290 


Problems 


1 Example 3 on page 263 gave the Killing vectors 0, and 04 of 
a cylinder. If we express these instead as two linearly independent 
Killing vectors that are linear combinations of these two, what is 
the geometrical interpretation? 


2 Section 7.4 told the story of Alice trying to find evidence that 
her spacetime is not stationary, and also listed the following exam- 
ples of spacetimes that were not stationary: (a) the solar system, 
(b) cosmological models, (c) gravitational waves propagating at the 
speed of light, and (d) a cloud of matter undergoing gravitational 
collapse. For each of these, show that it is possible for Alice to 
accomplish her mission. 


3 In the Schwarzschild spacetime, test particles can have cir- 
cular orbits only for r > re, where re = 3/2 in units where the 
Schwarzschild radius is 1. These orbits are unstable for r > 3 (the 
innermost stable circular orbit). The unstable orbit with r = r, 
exists only for massless particles, and r = r; is called the photon 
sphere. Consider the conserved quantities FE and L corresponding to 
the Schwarzschild spacetime’s Killing vectors 0; and Og, interpreted 
as the energy and angular momentum per unit mass. As the mass 
of a particle approaches zero, both of these blow up to infinity if 
the affine parameter is taken to be the proper time, but L/E is well 
behaved in this limit. Show that a photon on the photon sphere has 
L/E = +(1/2)38/2. 


4 If a spacetime has a certain symmetry, then we expect that 
symmetry to be detectable in the behavior of curvature scalars such 
as the scalar curvature R = R®%, and the Kretchmann invariant 
k= eR ia: 

(a) Show that the metric 


ds” = e297 dt? — dx? — dy? — dz” 


from page 285 has constant values of R = 1/2 and k = 1/4. Note 
that Maxima’s ctensor package has built-in functions for these; you 
have to call the lriemann and uriemann before calling them. 

(b) Similarly, show that the Petrov metric 


ds? = — dr? — e~*" dz? + e" [2 sin V3r d¢ dt — cos V3r(d¢” — dt”)| 


(p. 287) has R=Oandk=0. 


Remark: Surprisingly, one can have a spacetime on which every possible curva- 
ture invariant vanishes identically, and yet which is not flat. See Coley, Hervik, 
and Pelavas, “Spacetimes characterized by their scalar curvature invariants,” 
arxiv.org/abs/0901.0791v2. 


5 Section 7.5 on page 285 presented the Petrov metric. The 
purpose of this problem is to verify that the gravitational field it 
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represents does not fall off with distance. For simplicity, let’s restrict 
our attention to a particle released at an r such that cos /3r = 1, 
so that ¢ is the timelike coordinate. Let the particle be released at 
rest in the sense that initially it has ¢ = * = ¢ = 0, where dots 
represent differentiation with respect to the particle’s proper time. 
Show that the magnitude of the proper acceleration is independent 
of r. > Solution, p. 397 


6 The idea that a frame is “rotating” in general relativity can 
be formalized by saying that the frame is stationary but not static. 
Suppose someone says that any rotation must have a center. Give 
a counterexample. > Solution, p. 397 


7 In example 8, p. 267, we found the Doppler shifts ob- 
served by an observer infalling radially from rest at infinity into 
a Schwarzschild black hole. Carry out a similar analysis for the case 
where the observer is in a circular orbit, and show that such an 
observer will always see both blueshifts and redshifts. In order to 
find the motion of the observer in the circular orbit, you will need 
to either compute or look up online a couple of Christoffel sym- 
bols for the Schwarzschild spacetime, in Schwarzschild coordinates. 
> Solution, p. 398 
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Chapter 8 
Sources 


8.1 Sources in general relativity 


8.1.1 Point sources in a background-independent theory 


The Schrédinger equation and Maxwell’s equations treat space- 
time as a stage on which particles and fields act out their roles. 
General relativity, however, is essentially a theory of spacetime it- 
self. The role played by atoms or rays of light is so peripheral 
that by the time Einstein had derived an approximate version of 
the Schwarzschild metric, and used it to find the precession of Mer- 
cury’s perihelion, he still had only vague ideas of how light and mat- 
ter would fit into the picture. In his calculation, Mercury played the 
role of a test particle: a lump of mass so tiny that it can be tossed 
into spacetime in order to measure spacetime’s curvature, without 
worrying about its effect on the spacetime, which is assumed to be 
negligible. Likewise the sun was treated as in one of those orches- 
tral pieces in which some of the brass play from off-stage, so as to 
produce the effect of a second band heard from a distance. Its mass 
appears simply as an adjustable parameter m in the metric, and if 
we had never heard of the Newtonian theory we would have had no 
way of knowing how to interpret m. 


When Schwarzschild published his exact solution to the vacuum 
field equations, Einstein suffered from philosophical indigestion. His 
strong belief in Mach’s principle led him to believe that there was a 
paradox implicit in an exact spacetime with only one mass in it. If 
Einstein’s field equations were to mean anything, he believed that 
they had to be interpreted in terms of the motion of one body rela- 
tive to another. In a universe with only one massive particle, there 
would be no relative motion, and so, it seemed to him, no motion 
of any kind, and no meaningful interpretation for the surrounding 
spacetime. 


Not only that, but Schwarzschild’s solution had a singularity 
at its center. When a classical field theory contains singularities, 
Einstein believed, it contains the seeds of its own destruction. As 
we’ve seen on page 242, this issue is still far from being resolved, a 
century later. 


However much he might have liked to disown it, Einstein was 
now in possession of a solution to his field equations for a point 
source. In a linear, background-dependent theory like electromag- 
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netism, knowledge of such a solution leads directly to the ability to 
write down the field equations with sources included. If Coulomb’s 
law tells us the 1/r? variation of the electric field of a point charge, 
then we can infer Gauss’s law. The situation in general relativity 
is not this simple. The field equations of general relativity, unlike 
the Gauss’s law, are nonlinear, so we can’t simply say that a planet 
or a star is a solution to be found by adding up a large number of 
point-source solutions. It’s also not clear how one could represent a 
moving source, since the singularity is a point that isn’t even part 
of the continuous structure of spacetime (and its location is also 
hidden behind an event horizon, so it can’t be observed from the 
outside). 


8.1.2 The Einstein field equation 
The Einstein tensor 


Given these difficulties, it’s not surprising that Einstein’s first 
attempt at incorporating sources into his field equation was a dead 
end. He postulated that the field equation would have the Ricci 
tensor on one side, and the stress-energy tensor T@ (page 161) on 
the other, 

Rap = 84Tap, 


where a factor of G/c* on the right is suppressed by our choice 
of units, and the 87 is determined on the basis of consistency with 
Newtonian gravity in the limit of weak fields and low velocities. The 
problem with this version of the field equations can be demonstrated 
by counting variables. R and T are symmetric tensors, so the field 
equation contains 10 constraints on the metric: 4 from the diagonal 
elements and 6 from the off-diagonal ones. 


In addition, local conservation of mass-energy requires the div- 
ergence-free property V,T@ = 0. In order to construct an example, 
we recall that the only component of T’ for which we have so far 
introduced any physical interpretation is J“, which gives the den- 
sity of mass-energy. Suppose we had a stress-energy tensor whose 
components were all zero, except for a time-time component varying 
as IT“ = kt. This would describe a region of space in which mass- 
energy was uniformly appearing or disappearing everywhere at a 
constant rate. To forbid such examples, we need the divergence- 
free property to hold. This is exactly analogous to the continuity 
equation in fluid mechanics or electromagnetism, 0p/0t+V-J =0 
(or VaJ* = 0), which states that the quantity of fluid or charge is 
conserved. 


But imposing the divergence-free condition adds 4 more con- 
straints on the metric, for a total of 14. The metric, however, is a 
symmetric rank-2 tensor itself, so it only has 10 independent com- 
ponents. This overdetermination of the metric suggests that the 
proposed field equation will not in general allow a solution to be 
evolved forward in time from a set of initial conditions given on a 
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spacelike surface, and this turns out to be true. It can in fact be 
shown that the only possible solutions are those in which the traces 
R= R*, and T =T%, are constant throughout spacetime. 


The solution is to replace R,, in the field equations with a 
different tensor Gap, called the Einstein tensor, defined by Gay = 
Rab —_ (1/2) Rgap, 

Gap = 80 T ap. 


The Einstein tensor is constructed exactly so that it is divergence- 
free, V,G@ = 0. (This is not obvious, but can be proved by direct 
computation.) Therefore any stress-energy tensor that satisfies the 
field equation is automatically divergenceless, and thus no additional 
constraints need to be applied in order to guarantee conservation of 
mass-energy. 


Self-check: Does replacing Ry» with Gp invalidate the Schwarz- 
schild metric? 


This procedure of making local conservation of mass-energy “baked 


in” to the field equations is analogous to the way conservation of 
charge is treated in electricity and magnetism, where it follows from 
Maxwell’s equations rather than having to be added as a separate 
constraint. 


Interpretation of the stress-energy tensor 


The stress-energy tensor was briefly introduced in section 5.2 on 
page 161. By applying the Newtonian limit of the field equation 
to the Schwarzschild metric, we find that T“ is to be identified as 
the mass density p. The Schwarzschild metric describes a spacetime 
using coordinates in which the mass is at rest. In the cosmological 
applications we’ll be considering shortly, it also makes sense to adopt 
a frame of reference in which the local mass-energy is, on average, 
at rest, so we can continue to think of T“ as the (average) mass 
density. By symmetry, 7 must be diagonal in such a frame. For 
example, if we had J*” 4 0, then the positive x direction would 
be distinguished from the negative x direction, but there is nothing 
that would allow such a distinction. 


Dust in a different frame Example: 1 
As discussed in example 14 on page 132, it is convenient in 
cosmology to distinguish between radiation and “dust,” meaning 
noninteracting, nonrelativistic materials such as hydrogen gas or 
galaxies. Here “nonrelativistic” means that in the comoving frame, 
in which the average flow of dust vanishes, the dust particles all 
have |v| < 1. What is the stress-energy tensor associated with 
dust? 


Since the dust is nonrelativistic, we can obtain the Newtonian limit 
by using units in which c + 1, and letting c approach infinity. In 
Cartesian coordinates, the components of the stress-energy have 


Section 8.1. Sources in general relativity 


295 


296 


Chapter 8 Sources 


units that cause them to scale like 


1 1G. “Te Age 
We We We 1WeE 
We “Wee Ader Ave 
Lie Ae Ages Ae 


Tere 


In the limit of Cc — oo, we can therefore take the only source of 
gravitational fields to be T", which in Newtonian gravity must be 
the mass density p, so 


THY = 


eee ke) 
oO0O 0 
oO0O oO 
oO0O0 oO 


Under a Lorentz boost by v in the x direction, the tensor transfor- 
mation law gives 


y’p y°vp 0 0 
piv _|v’ve yev?p 0 0 
0 0 oO 
0 0 oO 


The over-all factor of y* arises because of the combination of 
two effects: each dust particle’s mass-energy is increased by a 
factor of y, and length contraction also multiplies the density of 
dust particles by a factor of y. In the limit of small boosts, the 
stress-energy tensor becomes 


pop vp 0 O 
Tey 25 vp 0 0 0 
Oa OG 
0 0 0 0 


This motivates the interpretation of the time-space components 
of 7 as the flux of mass-energy along each axis. In the primed 
frame, mass-energy with density p flows in the x direction at ve- 
locity v, so that the rate at which mass-energy passes through a 
window of area A in the y — z plane is given by pvA. 


This is also consistent with our imposition of the divergence-free 
property, by which we were essentially stating T to be the rate 
of flow of T“. 


The center of mass-energy Example: 2 
In Newtonian mechanics, for motion in one dimension, the to- 
tal momentum of a system of particles is given by Prot = MVem, 
where M is the total mass and Vem the velocity of the center of 
mass. Is there such a relation in relativity? 


Since mass and energy are equivalent, we expect that the rela- 
tivistic equivalent of the center of mass would have to be a center 
of mass-energy. 


It should also be clear that a center of mass-energy can only be 
well defined for a region of spacetime that is small enough so that 
effects due to curvature are negligible. For example, we can have 
cosmological models in which space is finite, and expands like the 
surface of a balloon being blown up. If the model is homogeneous 
(there are no “special points” on the surface of the balloon), then 
there is no point in space that could be a center. (A real balloon 
has a center, but in our metaphor only the balloon’s spherical sur- 
face correponds to physical space.) The fundamental issue here 
is the same geometrical one that caused us to conclude that there 
is no global conservation of mass-energy in general relativity (see 
section 4.5.1). In a curved spacetime, parallel transport is path- 
dependent, so we can’t unambiguously define a way of adding 
vectors that occur in different places. The center of mass is de- 
fined by a sum of position vectors. From these considerations we 
conclude that the center of mass-energy is only well defined in 
special relativity, not general relativity. 


For simplicity, let’s restrict ourselves to 1+1 dimensions, and adopt 
a frame of reference in which the center of mass is at rest at x = 0. 


Since T“ is interpreted as the density of mass-energy, the posi- 
tion of the center of mass must be given by 


0= [oat dx. 


By analogy with the Newtonian relation pio: = MVem, let's see 
what happens when we differentiate with respect to time. The ve- 
locity of the center of mass is then 0 = dxXem/dt = { 0;T"xdx. 
Applying the divergence-free property 0;T" + 0,T™ = 0, this be- 
comes 0 = — { 0, Tx dx. Integration by parts gives us finally 


0= i. P*dx 


We've already interpreted T™ as the rate of flow of mass-energy, 
which is another way of describing momentum. We can therefore 
interpret 7 as the density of momentum, and the right-hand side 
of this equation as the total momentum. The interpretation is that 
a system’s center of mass-energy is at rest if and only if it has 
zero total momentum. 


Suppose, for example, that we prepare a uniform metal rod so 
that one end is hot and the other cold. We then deposit it in outer 
space, initially motionless relative to some observer. Although 
the rod itself is uniform, its mass-energy is very slightly nonuni- 
form, so its center of mass-energy must be displaced a tiny bit 
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away from the center, toward the hot end. As the rod approaches 
thermal equilibrium, the observer sees it accelerate very slightly 
and then come to rest again, so that its center of mass-energy 
remains fixed! An even stranger case is described in example 9 
on p. 312. 


Since the Einstein tensor is symmetric, the Einstein field equa- 
tion requires that the stress-energy tensor be symmetric as well. It 
is reassuring that according to example 1 the tensor is symmetric 
for dust, and that symmetry is preserved by changes of coordinates 
and by superpositions of sources. Besides dust, the other cosmo- 
logically significant sources of gravity are electromagnetic radiation 
and the cosmological constant, and one can also check that these 
give symmetry. Belinfante noted in 1939 that symmetry seemed to 
fail in the case of fields with intrinsic spin, but he found that this 
problem could be avoided by modifying the previously assumed way 
of connecting T to the properties of the field. This shows that it can 
be rather subtle to interpret the stress-energy tensor and connect it 
to experimental observables. For more on this connection, and the 
case of electromagnetic fields, see examples 7 and 8 on p. 309. 


In example 1, we found that T*’ had to be interpreted as the 
flux of T™ (i.e., the flux of mass-energy) across the x axis. Lorentz 
invariance requires that we treat t, x, y, and z symmetrically, and 
this forces us to adopt the following interpretation: T’”, where ju is 
spacelike, is the flux of the density of the mass-energy four-vector 
in the yz direction. In the comoving frame, in Cartesian coordi- 
nates, this means that T’”, T¥, and T** should be interpreted as 
pressures. For example, T* is the flux in the x direction of x- 
momentum. This is simply the pressure, P, that would be exerted 
on a surface with its normal in the x direction, so in the comov- 
ing frame we have T"” = diag(p,P, P,P). For a fluid that is not 
in equilibrium, the pressure need not be isotropic, and the stress 
exerted by the fluid need not be perpendicular to the surface on 
which it acts. The space-space components of JT’ would then be the 
classical stress tensor, whose diagonal elements are the anisotropic 
pressure, and whose off-diagonal elements are the shear stress. This 
is the reason for calling T the stress-energy tensor. 


The prediction of general relativity is then that pressure acts as a 
gravitational source with exactly the same strength as mass-energy 
density. This has important implications for cosmology, since the 
early universe was dominated by radiation, and a photon gas has 
P = p/3 (example 14, p. 132). 


Experimental tests 


But how do we know that this prediction is even correct? Can 
it be verified in the laboratory? The classic laboratory test of the 
strength of a gravitational source is the 1797 Cavendish experiment, 
in which a torsion balance was used to measure the very weak grav- 
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itational attractions between metal spheres. We could test this as- 
pect of general relativity by doing a Cavendish experiment with 
boxes full of photons, so that the pressure is of the same order 
of magnitude as the mass-energy. This is unfortunately utterly im- 
practical, since both P and p for a well-lit box are ridiculously small 
compared to p for a metal ball. 


However, the repulsive electromagnetic pressure inside an atomic 
nucleus is quite large by ordinary standards — about 10°° Pa! To 
see how big this is compared to the nuclear mass density of p ~ 
108 ke/m?, we need to take into account the factor of c? 4 1 in SI 
units, the result being that P/p is about 10~?, which is not too small. 
Thus if we measure gravitational interactions of nuclei with different 
values of P/p, we should be able to test this prediction of general 
relativity. This was done in a Princeton PhD-thesis experiment by 
Kreuzer! in 1966. 


Before we can properly describe and interpret the Kreuzer ex- 
periment, we need to distinguish the several different types of mass 
that could in principle be different from one another in a theory of 
gravity. We’ve already encountered the distinction between inertial 
and gravitational mass, which Edtvés experiments (p. 22) show to 
be equivalent to about one part in 10!7. But there is also a dis- 
tinction between an object’s active gravitational mass mq, which 
measures its ability to create gravitational fields, and its passive 
gravitational mass mp», which measures the force it feels when placed 
in an externally generated field. For experiments using laboratory- 
scale material objects at nonrelativistic velocities, the Newtonian 
limit applies, and we can think of active gravitational mass as a 
scalar, with a density T” = p. 


To understand how this relates to pressure as a source of gravi- 
tational fields, it is helpful to consider a case where P is about the 
same as p, which occurs for light. Light is inherently relativistic, so 
the Newtonian concept of a scalar gravitational mass breaks down, 
but we can still use “mass” in quotes to talk qualitatively about 
an electromagnetic wave’s active and passive participation in grav- 
itational effects. Experiments show that general relativity correctly 
predicts the deflection of light by the sun to about one part in 10° 
(p. 233). This is the electromagnetic equivalent of an Edtvés exper- 
iment; it shows that general relativity predicts the right thing about 
the proportion between a light wave’s inertial and passive gravita- 
tional “masses.” Now suppose that general relativity was wrong, 
and pressure was not a source of gravitational fields. This would 
cause a drastic decrease in the active gravitational “mass” of an 
electromagnetic wave. 


The Kreuzer experiment actually dealt with static electric fields 
inside nuclei, not electromagnetic waves, but it is still clear what we 


"Kreuzer, Phys. Rev. 169 (1968) 1007 


a; 
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a/A Cavendish balance, used 
to determine the gravitational 


constant. 
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b/A_ simplified diagram of 
Kreuzer’s modification. The 
moving teflon mass is submerged 
in a liquid with nearly the same 
density. 


motorized 
shaft 


c/The Kreuzer experiment. 1. 


SIGNAL 


should expect in general: if pressure does not act as a gravitational 
source, then the ratio mq/mp should be different for different nuclei. 
Specifically, it should be lower for a nucleus with a higher atomic 
number Z, in which the electrostatic pressures are higher. 


Kreuzer did a Cavendish experiment, figure b, using masses 
made of two different substances. The first substance was teflon. 
The second substance was a mixture of the liquids trichloroethylene 
and dibromoethane, with the proportions chosen so as to give a den- 
sity as close as possible to that of teflon. Teflon is 76% fluorine by 
weight, and the liquid is 74% bromine. Fluorine has atomic number 
Z = 9, bromine Z = 35, and since the electromagnetic force has a 
long range, the pressure within a nucleus scales upward roughly like 
gh (because any given proton is acted on by Z — 1 other protons, 
and the size of a nucleus scales like Z!/3, so P « Z/(Z'/3)?). The 
solid mass was immersed in the liquid, and the combined gravita- 
tional field of the solid and the liquid was detected by a Cavendish 
balance. 


Ideally, one would formulate the liquid mixture so that its passive- 
mass density was exactly equal to that of teflon, as determined by 
buoyancy. Any oscillation in the torque measured by the Cavendish 
balance would then indicate an inequivalence between active and 
passive gravitational mass. 


@MEATER OFF 
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There are two passive masses, P, and an active mass A consisting of 
a single 23-cm diameter teflon cylinder immersed in a fluid. The teflon cylinder is driven back and forth 
with a period of 400 s. The resulting deflection of the torsion beam is monitored by an optical lever and 
canceled actively by electrostatic forces from capacitor plates (not shown). The voltage required for this active 
cancellation is a measure of the torque exerted by A on the torsion beam. 2. Active mass as a function of 
temperature. 3. Passive mass as a function of temperature. In both 2 and 3, temperature is measured in units 
of ohms, i.e., the uncalibrated units of a thermistor that was immersed in the liquid. 


In reality, the two substances involved had different coefficients 
of thermal expansion, so slight variations in temperature made their 
passive-mass densities unequal. Kreuzer therefore measured both 
the buoyant force and the gravitational torque as functions of tem- 
perature. He determined that these became zero at the same tem- 
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perature, to within experimental errors, which verified the equiva- 
lence of active and passive gravitational mass to within a certain 
precision, 

Mp XK Ma 


to within 5 x 107°. 


Kreuzer intended this exeriment only as a test of mp, « Ma, 
but it was reinterpreted in 1976 by Will? as a test of the coupling 
of sources to gravitational fields as predicted by general relativity 
and other theories of gravity. Crudely, we’ve already argued that 
My X Mq would be substance-dependent if pressure did not cou- 
ple to gravitational fields. Will actually carried out a more careful 
calculation, of which I present a simplified summary. Suppose that 
pressure does not contribute as much to gravitational fields as is 
claimed by general relativity; its coupling is reduced by a factor 
1— 2, where x = 0 in general relativity.? Will considers a model 
consisting of pointlike particles interacting through static electrical 
forces, and shows that for such a system, 


1 
Me = Mp + ge: 
where U, is the electrical energy. The Kreuzer experiment then 
requires |x| < 6 x 10~?, meaning that pressure does contribute to 
gravitational fields as predicted by general relativity, to within a 
precision of 6%. 


One of the important ways in which Will’s calculation goes be- 
yond my previous crude argument is that it shows that when x = 0, 
as it does for general relativity, the correction term xU,/2 vanishes, 
and mq = Mp exactly. This is interpreted as follows. Let a bromine 
nucleus be referred to with a capital M, fluorine with the lowercase 
m. Then when a bromine nucleus and a fluorine nucleus interact 
gravitationally at a distance r, the Newtonian approximation ap- 
plies, and the total internal force acting on the pair of nuclei taken 
as a whole equals (mp,pM, — Mpma)/r? (in units where the New- 
tonian gravitational constant G equals 1). This vanishes only if 
MpMa — Mpma = 0, which is equivalent to mp)/M, = ma/Ma. If 
this proportionality fails, then the system violates Newton’s third 
law and conservation of momentum; its center of mass will acceler- 
ate along the line connecting the two nuclei, either in the direction 
of M or in the direction of m, depending on the sign of z. 


?Will, “Active mass in relativistic gravity: Theoretical interpretation of 
the Kreuzer experiment,” Ap. J. 204 (1976) 234, available online at adsabs. 
harvard.edu. A broader review of experimental tests of general relativity is 
given in Will, “The Confrontation between General Relativity and Experiment,” 
https://arxiv.org/abs/1403.7377. The Kreuzer experiment is discussed in 
section 4.4.3. 

3In Will’s notation, ¢, measures nonstandard coupling to pressure, C3 to 
internal energy, and ¢; to kinetic energy. By requiring that point-particle models 
agree with perfect-fluid models, one obtains (—2/3)¢1 = ¢3 = —G4 = a. 
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d/The Apollo 11 mission left 
behind this mirror, which in this 
photo shows the reflection of the 
black sky. The mirror is used 
for lunar laser ranging measure- 
ments, which have an accuracy 
of about a centimeter. 


Thus the vanishing of the correction term rU,/2 tells us that 
general relativity predicts exact conservation of momentum in this 
interaction. This is comforting, but a little susprising on the face 
of it. Newtonian gravity treats active and passive massive perfectly 
symmetrically, so that there is a perfect guarantee of conservation of 
momentum. But relativity incorporates them in a completely asym- 
metric manner, so there is no obvious reason that we should have 
perfect conservation of momentum. In fact we don’t have any gen- 
eral guarantee of conservation of momentum, since, as discussed in 
section 4.5.1 on page 148, the language of general relativity doesn’t 
even give us the symbols we would need in order to state a global 
conservation law for a vector. General relativity does, however, 
allow local conservation laws. We will have local conservation of 
mass-energy and momentum provided that the stress-energy ten- 
sor’s divergence V,T@ vanishes. 


Bartlett and van Buren‘ used this connection to conservation of 
momentum in 1986 to derive a tighter limit on x. Since the moon 
has an asymmetrical distribution of iron and aluminum, a nonzero 
x would cause it to have an anomalous acceleration along a certain 
line. Because lunar laser ranging gives extremely accurate data on 
the moon’s orbit, the constraint is tightened to |x| < 1x 107°. 


These are tests of general relativity’s predictions about the grav- 
itational fields generated by the pressure of a static electric field. In 
addition, there is indirect confirmation (p. 332) that general relativ- 
ity is correct when it comes to electromagnetic waves. 


Energy of gravitational fields not included in the stress-energy 
tensor 


Summarizing the story of the Kreuzer and Bartlett-van Buren 
results, we find that observations verify to high precision one of the 
defining properties of general relativity, which is that all forms of 
energy are equivalent to mass. That is, Einstein’s famous E = mc? 
can be extended to gravitational effects, with the proviso that the 
source of gravitational fields is not really a scalar m but the stress- 
energy tensor T’. 


But there is an exception to this even-handed treatment of all 
types of energy, which is that the energy of the gravitational field 
itself is not included in JT’, and is not even generally a well-defined 
concept locally. In Newtonian gravity, we can have conservation of 
energy if we attribute to the gravitational field a negative potential 
energy density —g?/87. But the equivalence principle tells us that 
g is not a tensor, for we can always make g vanish locally by going 
into the frame of a free-falling observer, and yet the tensor trans- 
formation laws will never change a nonzero tensor to a zero tensor 


“Phys. Rev. Lett. 57 (1986) 21. The result is summarized in section 3.7.3 of 
the review by Will. 
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under a change of coordinates. Since the gravitational field is not 
a tensor, there is no way to add a term for it into the definition of 
the stress-energy, which is a tensor. The grammar and vocabulary 
of the tensor notation are specifically designed to prevent writing 
down such a thing, so that the language of general relativity is not 
even capable of expressing the idea that gravitational fields would 
themselves contribute to T. 


Self-check: (1) Convince yourself that the negative sign in the 
expression —g’/87 makes sense, by considering the case where two 
equal masses start out far apart and then fall together and combine 
to make a single body with twice the mass. (2) The Newtonian 
gravitational field is the gradient of the gravitational potential ¢, 
which corresponds in the Newtonian limit to the time-time compo- 
nent of the metric. With this motivation, suppose someone proposes 
generalizing the Newtonian energy density —(V¢)?/87 to a similar 
expression such as —(Vag%,)(V°g.”), where V is now the covariant 
derivative, and g is the metric, not the Newtonian field strength. 
What goes wrong? 


As a concrete example, we observe that the Hulse-Taylor binary 
pulsar system (p. 232) is gradually losing orbital energy, and that 
the rate of energy loss exactly matches general relativity’s prediction 
of the rate of gravitational radiation. There is a net decrease in 
the forms of energy, such as rest mass and kinetic energy, that are 
accounted for in the stress energy tensor 7. We can account for 
the missing energy by attributing it to the outgoing gravitational 
waves, but that energy is not included in T, and we have to develop 
special techniques for evaluating that energy. Those techniques only 
turn out to apply to certain special types of spacetimes, such as 
asymptotically flat ones, and they do not allow a uniquely defined 
energy density to be attributed to a particular small region of space 
(for if they did, that would violate the equivalence principle). 


Gravitational energy is locally unmeasurable. Example: 3 
When a new form of energy is discovered, the way we estab- 
lish that it is a form of energy is that it can be transformed to or 
from other forms of energy. For example, Becquerel discovered 
radioactivity by noticing that photographic plates left in a desk 
drawer along with radium salts became clouded: some new form 
of energy had been converted into previously known forms such 
as chemical energy. It is only in this limited sense that energy is 
ever locally observable, and this limitation prevents us from mean- 
ingfully defining certain measures of energy. For example we can 
never measure the local electrical potential in the same sense 
that we can measure the local barometric pressure; a potential 
of 137 volts only has meaning relative to some other region of 
space taken to be at ground. Let’s use the acronym MELT to re- 
fer to measurement of energy by the local transformation of that 
energy from one form into another. 
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The reason MELT works is that energy (or actually the momen- 
tum four-vector) is locally conserved, as expressed by the zero- 
divergence property of the stress-energy tensor. Without con- 
servation, there is no notion of transformation. The Einstein field 
equations imply this zero-divergence property, and the field equa- 
tions have been well verified by a variety of observations, includ- 
ing many observations (such as solar system tests and observa- 
tion of the Hulse-Taylor system) that in Newtonian terms would 
be described as involving (non-local) transformations between ki- 
netic energy and the energy of the gravitational field. This agree- 
ment with observation is achieved by taking T = O in vacuum, 
regardless of the gravitational field. Therefore any local trans- 
formation of gravitational field energy into another form of energy 
would be inconsistent with previous observation. This implies that 
MELT is impossible for gravitational field energy. 


In particular, suppose that observer A carries out a local MELT 
of gravitational field energy, and that A sees this as a process in 
which the gravitational field is reduced in intensity, causing the 
release of some other form of energy such as heat. Now con- 
sider the situation as seen by observer B, who is free-falling in 
the same local region. B says that there was never any gravita- 
tional field in the first place, and therefore sees heat as violating 
local conservation of energy. In B’s frame, this is a nonzero di- 
vergence of the stress-energy tensor, which falsifies the Einstein 
field equations. 


Some examples 


We conclude this introduction to the stress-energy tensor with 


some illustrative examples. 
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A perfect fluid Example: 4 
For a perfect fluid, we have 


Tab = (0 + P)VaVp — SPQap; 


where s = 1 for our + — —— signature or —1 for the signature 
— +++, and v represents the coordinate velocity of the fluid’s rest 
frame. 


Suppose that the metric is diagonal, but its components are vary- 
ing, Jug = diag(A*, —B?,...). The properly normalized velocity 
vector of an observer at (coordinate-)rest is v* = (A~',0,0,0). 
Lowering the index gives v, = (SA,0,0,0). The various forms of 
the stress-energy tensor then look like the following: 


To =A*p = 711 = BPP 
TH) =sp 1 =-sP 
PosAcp (re hs B7oP: 


A rope dangling in a Schwarzschild spacetime Example: 5 
Suppose we want to lower a bucket on a rope toward the event 
horizon of a black hole. We have already made some qualitative 
remarks about this idea in example 14 on p. 64. This seemingly 
whimsical example turns out to be a good demonstration of some 
techniques, and can also be used in thought experiments that 
illustrate the definition of mass in general relativity and that probe 
some ideas about quantum gravity.° 


The Schwarzschild metric (p. 223) is 
ds? = f de® — f-? dr? +..., 


where f = (1 — 2m/r)'/?, and ... represents angular terms. We 
will end up needing the following Christoffel symbols: 


Since the spacetime has spherical symmetry, it ends up being 
more convenient to consider a rope whose shape, rather than 
being cylindrical, is a cone defined by some set of (8,@). For 
convenience we take this set to cover unit solid angle. The final 
results obtained in this way can be readily converted into state- 
ments about a cylindrical rope. We let u be the mass per unit 
length of the rope, and 7 the tension. Both of these may depend 
on r. The corresponding energy density and tensile stress are 
0 = u/A=u/r? and S = T/A. To connect this to the stress-energy 
tensor, we start by comparing to the case of a perfect fluid from 
example 4. Because the rope is made of fibers that have stength 
only in the radial direction, we will have T°° = T?® = 0. Further- 
more, the stress is tensile rather than compressional, correspond- 
ing to a negative pressure. The Schwarzschild coordinates are 
orthogonal but not orthonormal, so the properly normalized ve- 
locity of a static observer has a factor of f in it: v* = (f-', 0,0, 0), 
or, lowering an index, v. = (f,0,0,0). The results of example 4 
show that the mixed-index form of T will be the most convenient, 
since it can be expressed without messy factors of f. We have 


TS, =diag(p, S,0,0) = r~? diag(u, T, 0, 0). 


By writing the stress-energy tensor in this form, which is indepen- 
dent of t, we have assumed static equilibrium outside the event 
horizon. /nside the horizon, the r coordinate is the timelike one, 
the spacetime itself is not static, and we do not expect to find 
static solutions, for the reasons given on p. 64. 


’Brown, “Tensile Strength and the Mining of Black Holes,” arxiv.org/abs/ 
1207 .3342 
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Conservation of energy is automatically satisfied, since there is 
no time dependence. Conservation of radial momentum is ex- 
pressed by 
Wiel 0; 
or 
0= Vil’, + Vil + Var + Vig lk 
It would be tempting to throw away all but the first term, since 
T is diagonal, and therefore T' = T° = T® = 0. However, a 
covariant derivative can be nonzero even when the symbol being 
differentiated vanishes identically. Writing out these four terms, 
we have 
0=0,T'). 41,1, — eT, 
+0, Be — ee T' 
+7°%5, ae 
+0, Th, 
where each line corresponds to one covariant derivative. Evalu- 
ating this, we have 
f' / 
O=7'+—T-— 
+ f f H, 
where primes denote differentiation with respect to r. Note that 
since no terms of the form 0,T', occur, this expression is valid 
regardless of whether we take u to be constant or varying. Thus 
we are free to take p x r~2, so that pis constant, and this means 
that our result is equally applicable to a uniform cylindrical rope. 
This result is checked using computer software in example 6. 


This is a differential equation that tells us how the tensile stress in 
the rope varies along its length. The coefficient f/f = m/r(r—2m) 
blows up at the event horizon, which is as expected, since we do 
not expect to be able to lower the rope to or below the horizon. 


Let’s check the Newtonian limit, where the gravitational field is g 
and the potential is ®. In this limit, we have f ~ 1— ©, f’/f x g 
(with g > 0), and u > T, resulting in 


O=7'-— gp. 
which is the expected Newtonian relation. 


Returning to the full general-relativistic result, it can be shown that 
for a loaded rope with no mass of its own, we have a finite result 
for lim;_4. T, even when the bucket is brought arbitrarily close to 
the horizon. (The solution in this case is just T = 7T,,./f, where 
Ts. is the tension at r = co.) However, this is misleading without 
the caveat that for u < 7, the speed of transverse waves in the 
rope is greater than c, which is not possible for any known form 
of matter — it would violate the null energy condition, discussed 
in the following section. 


OAmAN nor wWN FH 
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The rope, using computer algebra Example: 6 
The result of example 5 can be checked with the following Maxima 
code: 


load(ctensor) ; 
ct_coords: [t,r,theta, phi] ; 
depends (f,r); 
depends(ten,r); /* tension depends on r */ 
depends(mu,r); /* mass/length depends on r */ 
lg:matrix([f*2,0,0,0], 
[0,-£°-2,0,0], 
[O,0;=7 2,01). 
[0,0,0,-r72*sin(theta)~2]); 
cmetric(); 
christof (mcs) ; 
/* stress-energy tensor, T*mu_nu */ 
t:r*-2*matrix ( 
[mu,0,0,0], 
[0,ten,0,0], 
[0,0,0,0], 
[0,0,0,0] 


Compute covariant derivative of the stress-energy 
tensor with respect to its first index. The 
function checkdiv is defined so that the first 
index has to be covariant (lower); the T I’m 
putting in is T°mu_nu, and since it’s symmetric, 
that’s the same as T_mu“nu. 

*/ 

checkdiv(t) ; 


8.1.3 Energy conditions 


Physical theories are supposed to answer questions. For exam- 


ple: 


1. Does a small enough physical object always have a world-line 
that is approximately a geodesic? 


2. Do massive stars collapse to form black-hole singularities? 
3. Did our universe originate in a Big Bang singularity? 


4. If our universe doesn’t currently have violations of causality 
such as the closed timelike curves exhibited by the Petrov met- 
ric (p. 287), can we be assured that it will never develop causal- 
ity violation in the future? 
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We would like to “prove” whether the answers to questions like these 
are yes or no, but physical theories are not formal mathematical 
systems in which results can be “proved” absolutely. For example, 
the basic structure of general relativity isn’t a set of axioms but a 
list of ingredients like the equivalence principle, which has evaded 
formal definition.® 


Even the Einstein field equations, which appear to be completely 
well defined, are not mathematically formal predictions of the be- 
havior of a physical system. The field equations are agnostic on 
the question of what kinds of matter fields contribute to the stress- 
energy tensor. In fact, any spacetime at all is a solution to the 
Einstein field equations, provided we’re willing to admit the corre- 
sponding stress-energy tensor. We can never answer questions like 
the ones above without assuming something about the stress-energy 
tensor. 


In example 14 on page 132, we saw that radiation has P = p/3 
and dust has P = 0. Both have p > 0. If the universe is made out 
of nothing but dust and radiation, then we can obtain the following 
four constraints on the energy-momentum tensor: 


trace energy condition p-—3P>0 

strong energy condition p+3P>0and p+P>0 
dominant energy condition p> 0 and |P| < p 

weak energy condition p>Oandp+P>0 

null energy condition p+P>0 


These are arranged roughly in order from strongest to weakest. They 
all have to do with the idea that negative mass-energy doesn’t seem 
to exist in our universe, i.e., that gravity is always attractive rather 
than repulsive. With this motivation, it would seem that there 
should only be one way to state an energy condition: p > 0. But 
the symbols p and P refer to the form of the stress-energy tensor 
in a special frame of reference, interpreted as the one that is at rest 
relative to the average motion of the ambient matter. (Such a frame 
is not even guaranteed to exist unless the matter acts as a perfect 
fluid.) In this frame, the tensor is diagonal. Switching to some other 
frame of reference, the p and P parts of the tensor would mix, and 
it might be possible to end up with a negative energy density. The 
weak energy condition is the constraint we need in order to make 
sure that the energy density is never negative in any frame. 


The dominant energy condition is like the weak energy condition, 
but it also guarantees that no observer will see a flux of energy 
flowing at speeds greater than c. 


The strong energy condition essentially states that gravity is 
never repulsive; it is violated by the cosmological constant (see 


°“Theory of gravitation theories: a no-progress report,” Sotiriou, Faraoni, 
and Liberati, http: //arxiv.org/abs/0707 .2748 
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p. 322). 


An electromagnetic wave Example: 7 
In example 1 on p. 295, we saw that dust boosted along the x 
axis gave a stress-energy tensor 


1 ov 
Fis = y’e ¢ 2) 5 


where we now suppress the y and Z parts, which vanish. For 
v > 1, this becomes 
i, 
Thy = p’ ({ i) ’ 


where p’ is the energy density as measured in the new frame. As 
a source of gravitational fields, this ultrarelativistic dust is indis- 
tinguishable from any other form of matter with v = 1 along the 
X axis, so this is also the stress-energy tensor of an electromag- 
netic wave with local energy-density p’, propagating along the x 
axis. (For the full expression for the stress-energy tensor of an 
arbitrary electromagnetic field, see the Wikipedia article “Electro- 
magnetic stress-energy tensor.”) 


This is a stress-energy tensor that represents a flux of energy at a 
speed equal to c, so we expect it to lie at exactly the limit imposed 
by the dominant energy condition (DEC). Our statement of the 
DEC, however, was made for a diagonal stress-energy tensor, 
which is what is seen by an observer at rest relative to the matter. 
But we know that it’s impossible to have an observer who, as the 
teenage Einstein imagined, rides alongside an electromagnetic 
wave on a motorcycle. One way to handle this is to generalize our 
definition of the energy condition. For the DEC, it turns out that 
this can be done by requiring that the matrix 7, when multiplied 
by a vector on or inside the future light-cone, gives another vector 
on or inside the cone. 


A less elegant but more concrete workaround is as follows. Re- 
turning to the original expression for the 7 of boosted dust at 
velocity v, we let v = 1 + €, where |e| < 1. This gives a stress- 
energy tensor that (ignoring multiplicative constants) looks like: 


1 T+e 
¢ te 1+ 5) ; 
If e is negative, we have ultrarelativistic dust, and we can verify 
that it satisfies the DEC by un-boosting back to the rest frame. 
To do this explicitly, we can find the matrix’s eigenvectors, which 
(ignoring terms of order €?) are (1, 1+) and (1, 1—e), with eigen- 


values 2 + 2e and O, respectively. For e < 0, the first of these is 
timelike, the second spacelike. We interpret them simply as the 
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t and x basis vectors of the rest frame in which we originally de- 
scribed the dust. Using them as a basis, the stress-energy tensor 
takes on the form diag(2 + 2e, 0). Except for a constant factor that 
we didn’t bother to keep track of, this is the original form of the 
T in the dust’s rest frame, and it clearly satisfies the DEC, since 
PaO: 


For e > 0, v=1+e is a velocity greater than the speed of light, 
and there is no way to construct a boost corresponding to —v. We 
can nevertheless find a frame of reference in which the stress- 
energy tensor is diagonal, allowing us to check the DEC. The 
expressions found above for the eigenvectors and eigenvalues 
are still valid, but now the timelike and spacelike characters of the 
two basis vectors have been interchanged. The stress-energy 
tensor has the form diag(0, 2 + 2e), with p = 0 and P > 0, which 
violates the DEC. As in this example, any flux of mass-energy at 
speeds greater than c will violate the DEC. 


The DEC is obeyed for e < 0 and violated for e« > 0, and since e = 
0 gives a stress-energy tensor equal to that of an electromagnetic 
wave, we can tell that light is exactly on the border between forms 
of matter that fulfill the DEC and those that don’t. Since the DEC 
is formulated as a non-strict inequality, it follows that light obeys 
the DEC. 


‘No “speed of flux” Example: 8 
The foregoing discussion may have encouraged the reader to be- 
lieve that it is possible in general to read off a “speed of energy 
flux” from the value of T at a point. This is not true. 


The difficulty lies in the distinction between flow with and without 
accumulation, which is sometimes valid and sometimes not. In 
springtime in the Sierra Nevada, snowmelt adds water to alpine 
lakes more rapidly than it can flow out, and the water level rises. 
This is flow with accumulation. In the fall, the reverse happens, 
and we have flow with depletion (negative accumulation). 


Figure e/1 shows a second example in which the distinction seems 
valid. Charge is flowing through the lightbulb, but because there 
is no accumulation of charge in the DC circuit, we can’t detect the 
flow by an electrostatic measurement; the wire does not attract 
the tiny bits of paper below it on the table. 


But we know that with different measurements, we could detect 
the flow of charge in e/1. For example, the magnetic field from 
the wire would deflect a nearby magnetic compass. This shows 
that the distinction between flow with and without accumulation 
may be sometimes valid and sometimes invalid. Flow without 
accumulation may or may not be detectable; it depends on the 
physical context. 


In figure e/2, an electric charge and a magnetic dipole are super- 


Chapter 8 Sources 


imposed at a point. The Poynting vector P defined as E x B is 
used in electromagnetism as a measure of the flux of energy, and 
it tells the truth, for example, when the sun warms your sun on 
a hot day. In e/2, however, all the fields are static. It seems as 
though there can be no flux of energy. But that doesn’t mean that 
the Poynting vector is lying to us. It tells us that there is a pat- 
tern of flow, but it’s flow without accumulation; the Poynting vector 
forms circular loops that close upon themselves, and electromag- 
netic energy is transported in and out of any volume at the same 
rate. We would perhaps prefer to have a mathematical rule that 
gave zero for the flux in this situation, but it's acceptable that our 
rule P = E x B gives a nonzero result, since it doesn’t incorrectly 
predict an accumulation, which is what would be detectable. 


Now suppose we're presented with this stress-energy tensor, mea- 
sured at a single point and expressed in some units: 


THB 4.037 + 0.002 4.038 + 0.002 
~ \4.036 + 0.002 4.036 + 0.002 / ° 


To within the experimental error bars, it has the right form to be 
many different things: (1) We could have a universe filled with 
perfectly uniform dust, moving along the x axis at some ultrarel- 
ativistic speed v so great that the e in v = 1 — e, as in example 
7, is not detectably different from zero. (2) This could be a point 
sampled from an electromagnetic wave traveling along the x axis. 
(3) It could be a point taken from figure e/2. (In cases 2 and 3, 
the off-diagonal elements are simply the Poynting vector.) 


In cases 1 and 2, we would be inclined to interpret this stress- 
energy tensor by saying that its off-diagonal part measures the 
flux of mass-energy along the x axis, while in case 3 we would re- 
ject such an interpretation. The trouble here is not so much in our 
interpretation of T as in our Newtonian expectations about what is 
or isn’t observable about fluxes that flow without accumulation. In 
Newtonian mechanics, a flow of mass is observable, regardless 
of whether there is accumulation, because it carries momentum 
with it; a flow of energy, however, is undetectable if there is no 
accumulation. The trouble here is that relativistically, we can’t 
maintain this distinction between mass and energy. The Einstein 
field equations tell us that a flow of either will contribute equally to 
the stress-energy, and therefore to the surrounding gravitational 
field. 


The flow of energy in e/2 contributes to the gravitational field, and 
its contribution is changed, for example, if the magnetic field is re- 
versed. The figure is in fact not a bad qualitative representation of 
the spacetime around a rotating, charged black hole. At large dis- 
tances, however, the gravitational effect of the off-diagonal terms 
in T becomes small, because they average to nearly zero over a 
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sufficiently large spherical region. The distant gravitational field 
approaches that of a point mass with the same mass-energy. 


Momentum in static fields Example: 9 
Continuing the train of thought described in example 8, we can 
come up with situations that seem even more paradoxical. In fig- 
ure e/2, the total momentum of the fields vanishes by symmetry. 
This symmetry can, however, be broken by displacing the electric 
charge by AR perpendicular to the magnetic dipole vector D. The 
total momentum no longer vanishes, and now lies in the direction 
of D x AR. But we have proved in example 2 on p. 296 that a 
system’s center of mass-energy is at rest if and only if its total 
momentum is zero. Since this system’s center of mass-energy is 
certainly at rest, where is the other momentum that cancels that 
of the electric and magnetic fields? 


Suppose, for example, that the magnetic dipole consists of a loop 
of copper wire with a current running around it. If we open a 
switch and extinguish the dipole, it appears that the system must 
recoil! This seems impossible, since the fields are static, and an 
electric charge does not interact with a magnetic dipole. 


Babson et al.’ have analyzed a number of examples of this type. 
In the present one, the mysterious “other momentum” can be at- 
tributed to a relativistic imbalance between the momenta of the 
electrons in the different parts of the wire. A subtle point about 
these examples is that even in the case of an idealized dipole of 
vanishingly small size, it makes a difference what structure we as- 
sume for the dipole. In particular, the field’s momentum is nonzero 
for a dipole made from a current loop of infinitesimal size, but zero 
for a dipole made out of two magnetic monopoles.® 


Geodesic motion of test particles 


Question 1 on p. 307 was: “Does a small enough physical object 
always have a world-line that is approximately a geodesic?” In other 
words, do Eétv6s experiments give null results when carried out in 
laboratories using real-world apparatus of small enough size? We 
would like something of this type to be true, since general relativity 
is based on the equivalence principle, and the equivalence principle is 
motivated by the null results of E6tvds experiments. Nevertheless, it 
is fairly easy to show that the answer to the question is no, unless we 
make some more specific assumption, such as an energy condition, 
about the system being modeled. 


Before we worry about energy conditions, let’s consider why the 
small size of the apparatus is relevant. Essentially this is because of 
gravitational radiation. In a gravitationally radiating system such 
as the Hulse-Taylor binary pulsar (p. 232), the material bodies lose 


“Am. J. Phys. 77 (2009) 826 
SMilton and Meille, arxiv.org/abs/1208 .4826 


Chapter 8 Sources 


energy, and as with any radiation process, the radiated power de- 
pends on the square of the strength of the source. The world-line of 
a such a body therefore depends on its mass, and this shows that its 
world-line cannot be an exact geodesic, since the initially tangent 
world-lines of two different masses diverge from one another, and 
these two world-lines can’t both be geodesics. 


Let’s proceed to give a rough argument for geodesic motion and 
then try to poke holes in it. When we test geodesic motion, we do 
an Edtv6s experiment that is restricted to a certain small region 
of spacetime S. Our test-body’s world-line enters S with a certain 
energy-momentum vector p and exits with p’. If spacetime was 
flat, then Gauss’s theorem would hold exactly, and the vanishing 
divergence V,T of the stress-energy tensor would require that the 
incoming flux represented by p be exactly canceled by the outgoing 
flux due to p’. In reality spacetime isn’t flat, and it isn’t even possible 
to compare p and p’ except by parallel-transporting one into the 
same location as the other. Parallel transport is path-dependent, 
but if we make the reasonable restriction to paths that stay within S, 
we expect the ambiguity due to path-dependence to be proportional 
to the area enclosed by any two paths, so that if S is small enough, 
the ambiguity can be made small. Ignoring this small ambiguity, 
we can see that one way for the fluxes to cancel would be for the 
particle to travel along a geodesic, since both p and p’ are tangent 
to the test-body’s world-line, and a geodesic is a curve that parallel- 
transports its own tangent vector. Geodesic motion is therefore one 
solution, and we expect the solution to be nearly unique when S$ is 
small. 


Although this argument is almost right, it has some problems. 
First we have to ask whether “geodesic” means a geodesic of the full 
spacetime including the object’s own fields, or of the background 
spacetime B that would have existed without the object. The latter 
is the more sensible interpretation, since the question is basically 
asking whether a spacetime can really be defined geometrically, as 
the equivalence principle claims, based on the motion of test par- 
ticles inserted into it. We also have to define words like “small 
enough” and “approximately;” to do this, we imagine a sequence of 
objects O, that get smaller and smaller as n increases. We then 
form the following conjecture, which is meant to formulate question 
1 more exactly: Given a vacuum background spacetime B, and a 
timelike world-line @ in B, consider a sequence of spacetimes S,,, 
formed by inserting the O, into B, such that: (i) the metric of S,, is 
defined on the same points as the metric of B; (ii) O, moves along 
é, and for any r > 0, there exists some n such that for m > n, Om 
is smaller than r;° (iii) the metric of S,, approaches the metric of B 


%ie., at any point P on 2, an observer moving along @ at P defines a surface 


of simultaneity K passing through P, and sees the stress-energy tensor of On as 
vanishing outside of a three-sphere of radius r within K and centered on P 
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f / Negative mass. 


OZ 


g/The black sphere is made 
of ordinary matter. The 
crosshatched sphere has positive 
gravitational mass and negative 
inertial mass. If the two of them 
are placed side by side in empty 
space, they will both accelerate 
steadily to the right, gradually 
approaching the speed of light. 
Conservation of momentum is 
preserved, because the exotic 
sphere has leftward momentum 
when it moves to the right, so the 
total momentum is always zero. 


as n — oo. Then £ is a geodesic of B. 


This is almost right but not quite, as shown by the following 
counterexample. Papapetrou!® has shown that a spinning body in 
a curved background spacetime deviates from a geodesic with an 
acceleration that is proportional to LR, where L is its angular mo- 
mentum and R is the Riemann curvature. Let all the O,, have a fixed 
value of L, but let the spinning mass be concentrated into a smaller 
and smaller region as n increases, so as to satisfy (ii). As the radius 
r decreases, the motion of the particles composing an O, eventually 
has to become ultrarelativistic, so that the main contribution to the 
gravitational field is from the particles’ kinetic energy rather than 
their rest mass. We then have L ~ pr ~ Er, so that in order to 
keep L constant, we must have E' x 1/r. This causes two problems. 
First, it makes the gravitational field blow up at small distances, 
violating (iii). Also, we expect that for any known form of matter, 
there will come a point (probably the Tolman-Oppenheimer- Volkoff 
limit) at which we get a black hole; the singularity is then not part 
of the spacetime S,,, violating (i). But our failed counterexample 
can be patched up. We obtain a supply of exotic matter, whose 
gravitational mass is negative, and we mix enough of this mysteri- 
ous stuff into each O, so that the gravitational field shrinks rather 
than growing as n increases, and no black hole is ever formed. 


Ehlers and Geroch" have proved that it suffices to require an 
additional condition: (iv) The O,, satisfy the dominant energy con- 
dition. This rules out our counterexample. 


The Newtonian limit 


In units with c 4 1, a quantity like +P is expressed as p+ P/c?. 
The Newtonian limit is recovered as c + co, which makes the pres- 
sure term negligible, so that all the energy conditions reduce to 
p = 0. What would it mean if this was violated? Would p < 0 
describe an object with negative inertial mass, which would accel- 
erate east when you pushed it to the west? Or would it describe 
something with negative gravitational mass, which would repel or- 
dinary matter? We can imagine various possiblities, as shown in 
figure f. Anything that didn’t lie on the main diagonal would vi- 
olate the equivalence principle, and would therefore be impossible 
to accomodate within general relativity’s geometrical description of 
gravity. If we had “upsidasium” matter such as that described by 
the second quadrant of the figure (example 2, p. 26), gravity would 
be like electricity, except that like masses would attract and oppo- 
sites repel; we could have gravitational dielectrics and gravitational 
Faraday cages. The fourth quadrant leads to amusing possibilities 
like figure g. 


Proc. Royal Soc. London A 209 (1951) 248. The relevant result is summa- 
rized in Misner, Thorne, and Wheeler, Gravitation, p. 1121. 
arxiv.org/abs/gr-qc/0309074v1 
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No gravitational shielding Example: 10 
Electric fields can be completely excluded from a Faraday cage, 
and magnetic fields can be very strongly blocked with high-perme- 
ability materials such as mu-metal. It would be fun if we could do 
the same with gravitational fields, so that we could have zero- 
gravity or near-zero-gravity parties in a specially shielded room. 
It would be a form of antigravity, but a different one than the “upsi- 
dasium” type. Unfortunately this is difficult to do, and the reason 
it’s difficult turns out to be related to the unavailability of materials 
that violate energy conditions. 


First we need to define what we mean by shielding. We restrict 
ourselves to the Newtonian limit, and to one dimension, so that a 
gravitational field is specified by a function of one variable g(x). 
The best kind of shielding would be some substance that we 
could cut with shears and form into a box, and that would ex- 
clude gravitational fields from the interior of the box. This would 
be analogous to a Faraday cage; no matter what external field it 
was embedded in, it would spontaneously adjust itself so that the 
internal field was canceled out. A less desirable kind of shielding 
would be one that we could set up on an ad hoc basis to null out a 
specific, given, externally imposed field. Once we know what the 
external field is, we try to choose some arrangement of masses 
such that the field is nulled out. We will show that even this kind 
of shielding is unachievable, if nulling out the field is interpreted 
to mean this: at some point, which for convenience we take to be 
the origin, we wish to have a gravitational field such that g(0) = 0, 
dg/dx(0) = 0, ...d’g/dx"(0) = 0, where nis arbitrarily specified. 
For comparison, magnetic fields can be nulled out according to 
this definition by building an appropriately chosen configuration of 
coils such as a Helmholtz coil. 


Since we’re only doing the Newtonian limit, the gravitational field 
is the sum of the fields made by all the sources, and we can take 
this as a Sum over point sources. For a point source m placed at 
Xo, the field g(x) is odd under reflection about xo. The derivative 
of the field g’(x) is even. Since g’ is even, we can’t control its sign 
at x = 0 by choosing Xo > 0 or X < 0. The only way to control 
the sign of g’ is by choosing the sign of m. Therefore if the sign 
of the externally imposed field’s derivative is wrong, we can never 
null it out. Figure h shows a special case of this theorem. 


The theorem does not apply to three dimensions, and it does not 
prove that all fields are impossible to null out, only that some are. 
For example, the field inside a hemispherical shell can be nulled 
by adding another hemispherical shell to complete the sphere. | 
thank P. Allen for helpful discussion of this topic. 


g+0 


h/Nulling out a _ gravitational 
field is impossible in one dimen- 
sion without exotic matter. 1. The 
planet imposes a nonvanishing 
gravitational field with a nonvan- 
ishing gradient. 2. We can null 
the field at one point in space, by 
placing a sphere of very dense, 
but otherwise normal, matter 
overhead. The stick figure still 
experiences a tidal force, g’ # 0. 
3. To change the field’s derivative 
without changing the field, we 
can place two additional masses 
above and below the given point. 
But to change its derivative in the 
desired direction — toward zero 
— we would have to make these 
masses negative. 
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A sort of global conservation law 


We saw in sec. 4.5.1, p. 148, that although the energy-momentum 
of matter fields is strictly locally conserved, we cannot typically 
extend this to any kind of global conservation law if the spacetime is 
curved. This is because Gauss’s theorem fails in a curved spacetime. 
However, we can make at least some global inferences about energy, 
if we assume the DEC. 


As a concrete example, consider the old steady-state cosmolog- 
ical models (discussed in more detail for their historical interest in 
sec. 8.4, p. 363). In these models, as the universe expands, hydrogen 
atoms spontaneously pop into existence in such a way that there is 
never any dilution of the universe’s matter on the average. The pro- 
ponents of these models realized early on that they would require 
some modification of general relativity. For suppose that we have 
some empty region of spacetime small enough so that curvature can 
be neglected, and then a hydrogen atom appears in it. Spacetime 
itself is not supposed to have a preferred reference frame accord- 
ing to general relativity, but the newborn atom defines one. For an 
observer in the atom’s rest frame, whose time coordinate is t, we 
have 0,7 = OT" /Odt 4 0. The stress-energy tensor has a nonzero 
divergence, which cannot happen in a spacetime that is a solution 
of the Einstein field equations. 


The idea here is that although it is possible to trade gravitational 
energy (which is not counted in the stress-energy) for the energy of 
matter fields, we must always do so in such a way that to a local, 
free-falling observer, energy appears to be conserved. This is the 
equivalence principle. 


What I gave above is only an argument that rules out one spe- 
cific example, in which a hydrogen atom spontaneously pops into 
existence. The only fact about this matter field that was used in 
the argument was that it was possible to define a local Minkowski 
frame in which the matter field was at rest. This is equivalent to 
assuming the DEC (as a strict inequality). The DEC ensures that 
the flow of energy is subluminal, so that we can define such a frame. 
The condition can be relaxed to the normal, less strict definition of 
the DEC as a non-strict inequality (see Hawking and Ellis, p. 94, 
or Wald, p. 219), while still allowing the following theorem to be 
proved: if the stress-energy vanishes initially on any spacelike sur- 
face, and if the DEC holds, then the stress-energy also vanishes 
everywhere inside the past and future light cones of this surface. 


Singularity theorems 


An important example of the use of the energy conditions is that 
Hawking and Ellis have proved that under the assumption of the 
strong energy condition, any body that becomes sufficiently com- 
pact will end up forming a singularity. We might imagine that 
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the formation of a black hole would be a delicate thing, requiring 
perfectly symmetric initial conditions in order to end up with the 
perfectly symmetric Schwarzschild metric. Many early relativists 
thought so, for good reasons. If we look around the universe at 
various scales, we find that collisions between astronomical bodies 
are extremely rare. This is partly because the distances are vast 
compared to the sizes of the objects, but also because conservation 
of angular momentum has a tendency to make objects swing past 
one another rather than colliding head-on. Starting with a cloud of 
objects, e.g., a globular cluster, Newton’s laws make it extremely 
difficult, regardless of the attractive nature of gravity, to pick initial 
conditions that will make them all collide the future. For one thing, 
they would have to have exactly zero total angular momentum. 


Most relativists now believe that this is not the case. General 
relativity describes gravity in terms of the tipping of light cones. 
When the field is strong enough, there is a tendency for the light 
cones to tip over so far that the entire future light-cone points at the 
source of the field. If this occurs on an entire surface surrounding 
the source, it is referred to as a trapped surface. 


To make this notion of light cones “pointing at the source” more 
rigorous, we need to define the volume expansion 0. Let the set of 
all points in a spacetime (or some open subset of it) be expressed as 
the union of geodesics. This is referred to as a foliation in geodesics, 
or a congruence. Let the velocity vector tangent to such a curve 
be u*. Then we define 0 = Vgu*. This is exactly analogous to 
the classical notion of the divergence of the velocity field of a fluid, 
which is a measure of compression or expansion. Since O is a scalar, 
it is coordinate-independent. Negative values of © indicate that the 
geodesics are converging, so that volumes of space shrink. A trapped 
surface is one on which © is negative when we foliate with lightlike 
geodesics oriented outward along normals to the surface. 


When a trapped surface forms, any lumpiness or rotation in 
the initial conditions becomes irrelevant, because every particle’s 
entire future world-line lies inward rather than outward. A possi- 
ble loophole in this argument is the question of whether the light 
cones will really tip over far enough. We could imagine that un- 
der extreme conditions of high density and temperature, matter 
might demonstrate unusual behavior, perhaps including a negative 
energy density, which would then give rise to a gravitational repul- 
sion. Gravitational repulsion would tend to make the light cones tip 
outward rather than inward, possibly preventing the collapse to a 
singularity. We can close this loophole by assuming an appropriate 
energy condition. Penrose and Hawking have formalized the above 
argument in the form of a pair of theorems, known as the singularity 
theorems. One of these applies to the formation of black holes, and 
another one to cosmological singularities such as the Big Bang. 
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In a cosmological model, it is natural to foliate using world- 
lines that are at rest relative to the Hubble flow (or, equivalently, 
the world-lines of observers who see a vanishing dipole moment in 
the cosmic microwave background). The © we then obtain is pos- 
itive, because the universe is expanding. The volume expansion 
is © = 3H., where H, & 2.3 x 107!8 s~! is the Hubble constant 
(the fractional rate of change of the scale factor of cosmological dis- 
tances). The factor of three occurs because volume is proportional 
to the cube of the linear dimensions. 


Current status 


The current status of the energy conditions is shaky. Although 
it is clear that all of them hold in a variety of situations, there are 
strong reasons to believe that they are violated at both microscopic 
and cosmological scales, for reasons both classical and quantum- 
mechanical.!? We will see such a violation in the following section. 
However, there are general reasons to believe that such violations 
cannot be too extreme, or else they would result in instability of the 
form of matter in question.!° 


8.1.4 The cosmological constant 


Having included the source term in the Einstein field equations, 
our most important application will be to cosmology. Some of the 
relevant ideas originate long before Einstein. Once Newton had 
formulated a theory of gravity as a universal attractive force, he 
realized that there would be a tendency for the universe to collapse. 
He resolved this difficulty by assuming that the universe was infinite 
in spatial extent, so that it would have no center of symmetry, and 
therefore no preferred point to collapse toward. The trouble with 
this argument is that the equilibrium it describes is unstable. Any 
perturbation of the uniform density of matter breaks the symmetry, 
leading to the collapse of some pocket of the universe. If the radius 
of such a collapsing region is r, then its gravitational is proportional 
to r°, and its gravitational field is proportional to r3/r? = r. Since 
its acceleration is proportional to its own size, the time it takes to 
collapse is independent of its size. The prediction is that the uni- 
verse will have a self-similar structure, in which the clumping on 
small scales behaves in the same way as clumping on large scales; 
zooming in or out in such a picture gives a landscape that appears 
the same. With modern hindsight, this is actually not in bad agree- 
ment with reality. We observe that the universe has a hierarchical 
structure consisting of solar systems, galaxies, clusters of galaxies, 
superclusters, and so on. Once such a structure starts to condense, 
the collapse tends to stop at some point because of conservation of 


Barcelo and Visser, “Twilight for the energy conditions?,” http://arxiv. 
org/abs/gr-qc/0205066v1. 

'Buniy and Hsu, “Instabilities and the null energy condition,” http: //arxiv. 
org/abs/hep-th/0502203. 
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angular momentum. This is what happened, for example, when our 
own solar system formed out of a cloud of gas and dust. 


Einstein confronted similar issues, but in a more acute form. 
Newton’s symmetry argument, which failed only because of its in- 
stability, fails even more badly in relativity: the entire spacetime 
can simply contract uniformly over time, without singling out any 
particular point as a center. Furthermore, it is not obvious that 
angular momentum prevents total collapse in relativity in the same 
way that it does classically, and even if it did, how would that apply 
to the universe as a whole? Einstein’s Machian orientation would 
have led him to reject the idea that the universe as a whole could 
be in a state of rotation, and in any case it was sensible to start the 
study of relativistic cosmology with the simplest and most symmet- 
ric possible models, which would have no preferred axis of rotation. 


Because of these issues, Einstein decided to try to patch up his 
field equation so that it would allow a static universe. Looking back 
over the considerations that led us to this form of the equation, 
we see that it is very nearly uniquely determined by the following 
criteria: 


1. It should be consistent with experimental evidence for local 
conservation of energy-momentum. 


2. It should satisfy the equivalence principle. 
3. It should be coordinate-independent. 


4. It should be equivalent to Newtonian gravity or “plain” general 
relativity in the appropriate limit. 


5. It should not be overdetermined. 


This is not meant to be a rigorous proof, just a general observation 
that it’s not easy to tinker with the theory without breaking it. 


A failed attempt at tinkering Example: 11 
As an example of the lack of “wiggle room” in the structure of the 
field equations, suppose we construct the scalar T4,, the trace of 
the stress-energy tensor, and try to insert it into the field equa- 
tions as a further source term. The first problem is that the field 
equation involves rank-2 tensors, so we can’t just add a scalar. 
To get around this, suppose we multiply by the metric. We then 
have something like Gap = Cy Tap + CoJab 1%, where the two con- 
stants c; and c2 would be constrained by the requirement that the 
theory agree with Newtonian gravity in the classical limit. 


To see why this attempt fails, note that the stress-energy tensor of 
an electromagnetic field is traceless, 7% = 0. Therefore the beam 
of light’s coupling to gravity in the cz term is zero. As discussed on 
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pp. 299-302, empirical tests of conservation of momentum would 
therefore constrain cp to be < 10°. 


One way in which we can change the field equation without 
violating any of these requirements is to add a term Aggp, giving 


Gap = 82Tap + Agab, 


which is what we will refer to as the Einstein field equation.‘4 As 
we'll see in example 12 on p. 321, this is consistent with conserva- 
tion of energy-momentum (requirement 1 above) if and only if A is 
constant. In example 18 we find that its effects are only significant 
on the largest scales, which makes it undetectable, for example, in 
solar-system tests (criterion 4). For these reasons A is referred to as 
the cosmological constant. As we’ll see below, Einstein introduced 
it in order to make a certain type of cosmology work. 


We could also choose to absorb the Aggy term in the field equa- 
tions into the 877», as if the cosmological constant term were due 
to some form of matter. It would then be a perfect fluid (example 
4, p. 304) with a negative pressure, and it would violate the strong 
energy condition (example 14, p. 321). When we think of it this 
way, it’s common these days to refer to it as dark energy. But even 
if we think of it as analogous to a matter field, its constancy means 
that it has none of its own independent degrees of freedom. It can’t 
vibrate, rotate, flow, be compressed or rarefied, heated or cooled. 
It acts like a kind of energy that is automatically built in to every 
cubic centimeter of space. This is closely related to the fact that 
its contribution to the stress-energy tensor is proportional to the 
metric. One way of stating the equivalence principle (requirement 
2 above) is that space itself does not come equipped with any other 
tensor besides the metric. 


Einstein originally introduced a positive cosmological constant 
because he wanted relativity to be able to describe a static universe. 
To see why it would have this effect, compare its behavior with that 
of an ordinary fluid. When an ordinary fluid, such as the exploding 
air-gas mixture in a car’s cylinder, expands, it does work on its en- 
vironment, and therefore by conservation of energy its own internal 
energy is reduced. A positive cosmological constant, however, acts 
like a certain amount of mass-energy built into every cubic meter of 
vacuum. Thus when it expands, it releases energy. Its pressure is 
negative. 


Now consider the following semi-relativistic argument. Although 
we’ve already seen (page 229) that there is no useful way to sepa- 
rate the roles of kinetic and potential energy in general relativity, 
suppose that there are some quantities analogous to them in the 
description of the universe as a whole. (We’ll see below that the 


“Tn books that use a — + ++ metric rather then our + — ——, the sign of the 
cosmological constant term is reversed relative to ours. 
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universe’s contraction and expansion is indeed described by a set of 
differential equations that can be interpreted in essentially this way.) 
If the universe contracts, a cubic meter of space becomes less than 
a cubic meter. The cosmological-constant energy associated with 
that volume is reduced, so some energy has been consumed. The 
kinetic energy of the collapsing matter goes down, and the collapse 
is decelerated. 


Cosmological constant must be constant Example: 12 
If A is thought of as a form of matter, then it becomes natural 
to ask whether it’s spread more thickly in some places than oth- 
ers: is the cosmological “constant” really constant? The follow- 
ing argument shows that it cannot vary. The field equations are 
Gap = 87Tap+AQap. Taking the divergence of both sides, we have 
V2Gap = 81V7T ap + V2(Agap). The left-hand side vanishes (see 
p. 295). Since laboratory experiments have verified conservation 
of mass-energy to high precision for all the forms of matter rep- 
resented by 7, we have V?Tap = 0 as well. Applying the product 
rule to the term V2(Agap), we get Jap V7A + AV7Gap. But the co- 
variant derivative of the metric vanishes, so the result is simply 
VpA. Thus any variation in the cosmological constant over space 
or time violates the field equations, and the violation is equivalent 
to the violation we would get from a form of matter than didn’t 
conserve mass-energy locally. 


Cosmological constant is cosmological Example: 13 
The addition of the A term constitutes a change to the vacuum 
field equations, and the good agreement between theory and ex- 
periment in the case of, e.g., Mercury’s orbit puts an upper limit on 
/ then implies that A must be small. For an order-of-magnitude 
estimate, consider that A has units of mass density, and the only 
parameters with units that appear in the description of Mercury’s 
orbit are the mass of the sun, m, and the radius of Mercury’s or- 
bit, r. The relativistic corrections to Mercury’s orbit are on the 
order of v2, or about 10~®, and they come out right. Therefore we 
can estimate that the cosmological constant could not have been 
greater than about (10~®)m/r? ~ 101° kg/m’, or it would have 
caused noticeable discrepancies. This is a very poor bound; if A 
was this big, we might even be able to detect its effects in labora- 
tory experiments. Looking at the role played by r in the estimate, 
we see that the upper bound could have been made tighter by 
increasing r. Observations on galactic scales, for example, con- 
strain it much more tightly. This justifies the description of A as 
cosmological: the larger the scale, the more significant the effect 
of a nonzero A would be. 


Energy conditions Example: 14 
Since the right-hand side of the field equation is 87Tap + Agap, 
it is possible to consider the cosmological constant as a type of 
matter contributing to the stress-energy tensor. We then have 
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op = —P = A/8z. As described in more detail in section 8.2.11 on 
p. 354, we now know that A is positive. With A > 0, the weak 
and dominant energy conditions are both satisfied, so that in ev- 
ery frame of reference, p is positive and there is no flux of energy 
flowing at speeds greater than c. The negative pressure does vi- 
olate the strong energy condition, meaning that the constant acts 
as a form of gravitational repulsion. If the cosmological constant 
is a product of the quantum-mechanical structure of the vacuum, 
then this violation is not too surprising, because quantum fields 
are known to violate various energy conditions. For example, the 
energy density between two parallel conducting plates is negative 
due to the Casimir effect. 


8.2 Cosmological solutions 


We are thus led to pose two interrelated questions. First, what 
can empirical observations about the universe tell us about the laws 
of physics, such as the zero or nonzero value of the cosmological 
constant? Second, what can the laws of physics, combined with 
observation, tell us about the large-scale structure of the universe, 
its origin, and its fate? 


8.2.1 Evidence for the finite age of the universe 


We have a variety of evidence that the universe’s existence does 
not stretch for an unlimited time into the past. 


When astronomers view light from the deep sky that has been 
traveling through space for billions of years, they observe a uni- 
verse that looks different from today’s. For example, quasars were 
common in the early universe but are uncommon today. 


In the present-day universe, stars use up deuterium nuclei, but 
there are no known processes that could replenish their supply. We 
therefore expect that the abundance of deuterium in the universe 
should decrease over time. If the universe had existed for an infinite 
time, we would expect that all its deuterium would have been lost, 
and yet we observe that deuterium does exist in stars and in the 
interstellar medium. 


The second law of thermodynamics predicts that any system 
should approach a state of thermodynamic equilibrium, and yet our 
universe is very far from thermal equilibrium, as evidenced by the 
fact that our sun is hotter than interstellar space, or by the existence 
of functioning heat engines such as your body or an automobile 
engine. 


With hindsight, these observations suggest that we should not 
look for cosmological models that persist for an infinite time into 
the past. 
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8.2.2 Evidence for expansion of the universe 


We don’t only see time-variation in locally observable quantities 
such as quasar abundance, deuterium abundance, and entropy. In 
addition, we find empirical evidence for global changes in the uni- 
verse. By 1929, Edwin Hubble at Mount Wilson had determined 
that the universe was expanding, and historically this was the first 
convincing evidence that Einstein’s original goal of modeling a static 
cosmology had been a mistake. Einstein later referred to the cos- 
mological constant as the “greatest blunder of my life,” and for the 
next 70 years it was commonly assumed that A was exactly zero. 


Since we observe that the universe is expanding, the laws of ther- 
modynamics require that it also be cooling, just as the exploding 
air-gas mixture in a car engine’s cylinder cools as it expands. If the 
universe is currently expanding and cooling, it is natural to imagine 
that in the past it might have been very dense and very hot. This is 
confirmed directly by looking up in the sky and seeing radiation from 
the hot early universe. In 1964, Penzias and Wilson at Bell Labora- 
tories in New Jersey detected a mysterious background of microwave 
radiation using a directional horn antenna. As with many acciden- 
tal discoveries in science, the important thing was to pay attention 
to the surprising observation rather than giving up and moving on 
when it confounded attempts to understand it. They pointed the 
antenna at New York City, but the signal didn’t increase. The ra- 
diation didn’t show a 24-hour periodicity, so it couldn’t be from a 
source in a certain direction in the sky. They even went so far as to 
sweep out the pigeon droppings inside. It was eventually established 
that the radiation was coming uniformly from all directions in the 
sky and had a black-body spectrum with a temperature of about 3 
i. 


This is now interpreted as follows. At one time, the universe 
was hot enough to ionize matter. An ionized gas is opaque to light, 
since the oscillating fields of an electromagnetic wave accelerate the 
charged particles, depositing kinetic energy into them. Once the 
universe became cool enough, however, matter became electrically 
neutral, and the universe became transparent. Light from this time 
is the most long-traveling light that we can detect now. The latest 
data show that transparency set in when the temperature was about 
3000 K. The surface we see, dating back to this time, is known as 
the surface of last scattering. Since then, the universe has expanded 
by about a factor of 1000, causing the wavelengths of photons to 
be stretched by the same amount due to the expansion of the un- 
derlying space. This is equivalent to a Doppler shift due to the 
source’s motion away from us; the two explanations are equivalent. 
We therefore see the 3000 K optical black-body radiation red-shifted 
to 3 K, in the microwave region. 


It is logically possible to have a universe that is expanding but 


a/The horn’ antenna 
by Penzias and Wilson. 
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whose local properties are nevertheless static, as in the steady-state 
model of Fred Hoyle, in which some novel physical process sponta- 
neously creates new hydrogen atoms, preventing the infinite dilution 
of matter over the universe’s history, which in this model extends 
infinitely far into the past. But we have already seen strong empiri- 
cal evidence that the universe’s local properties (quasar abundance, 
etc.) are changing over time. The CMB is an even more extreme 
and direct example of this; the universe full of hot, dense gas that 
emitted the CMB is clearly nothing like today’s universe. A brief 
discussion of the steady-state model is given in section 8.4, p. 363. 


8.2.3 Evidence for homogeneity and isotropy 


These observations demonstrate that the universe is not homo- 
geneous in time, i.e., that one can observe the present conditions of 
the universe (such as its temperature and density), and infer what 
epoch of the universe’s evolution we inhabit. A different question 
is the Copernican one of whether the universe is homogeneous in 
space. Surveys of distant quasars show that the universe has very 
little structure at scales greater than a few times 107° m. (This can 
be seen on a remarkable logarithmic map constructed by Gott et 
al., astro.princeton.edu/universe.) This suggests that we can, 
to a good approximation, model the universe as being isotropic (the 
same in all spatial directions) and homogeneous (the same at all 
locations in space). (Isotropy does not follow from homogeneity. 
Examples of homogeneous but anisotropic cosmologies include ro- 
tating cosmologies and the Kantowsky-Sachs metric, problem 13, 
p. 368.) 


Further evidence comes from the extreme uniformity of the cos- 
mic microwave background radiation, once one subtracts out the 
dipole anisotropy due to the Doppler shift arising from our galaxy’s 
motion relative to the CMB. When the CMB was first discovered, 
there was doubt about whether it was cosmological in origin (rather 
than, say, being associated with our galaxy), and it was expected 
that its isotropy would be as large as 10%. As physicists began to 
be convinced that it really was a relic of the early universe, interest 
focused on measuring this anisotropy, and a series of measurements 
put tighter and tighter upper bounds on it. 


Other than the dipole term, there are two ways in which one 
might naturally expect anisotropy to occur. There might have 
been some lumpiness in the early universe, which might have served 
as seeds for the condensation of galaxy clusters out of the cosmic 
medium. Furthermore, we might wonder whether the universe as a 
whole is rotating. The general-relativistic notion of rotation is very 
different from the Newtonian one, and in particular, it is possible 
to have a cosmology that is rotating without having any center of 
rotation (see problem 6, p. 291). In fact one of the first exact solu- 
tions discovered for the Einstein field equations was the Gédel met- 
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ric, which described a bizarre rotating universe with closed timelike 
curves, i.e., one in which causality was violated. In a rotating uni- 
verse, one expects that radiation received from great cosmological 
distances will have a transverse Doppler shift, i.e., a shift originat- 
ing from the time dilation due to the motion of the distant matter 
across the sky. This shift would be greatest for sources lying in the 
plane of rotation relative to us, and would vanish for sources lying 
along the axis of rotation. The CMB would therefore show varia- 
tion with the form of a quadrupole term, 3 cos? @—1. In 1977 a U-2 
spyplane (the same type involved in the 1960 U.S.-Soviet incident) 
was used by Smoot et al.!° to search for anisotropies in the CMB. 
This experiment was the first to definitively succeed in detecting 
the dipole anisotropy. After subtraction of the dipole component, 
the CMB was found to be uniform at the level of ~ 3 x 107+. This 
provided strong support for homogeneous cosmological models, and 
ruled out rotation of the universe with w > 10-2? Hz. 


8.2.4 The FRW cosmologies 
The FRW metric and the standard coordinates 


Motivated by Hubble’s observation that the universe is expand- 
ing, we hypothesize the existence of solutions of the field equation 
in which the properties of space are homogeneous and isotropic, but 
the over-all scale of space is increasing as described by some scale 
function a(t). Because of coordinate invariance, the metric can still 
be written in a variety of forms. One such form is 


ds? = dt? — a(t)? dé’, 
where the spatial part is 


dé? = f(r) dr? + r?7 dé? +r? sin? 6 d¢?. 


To interpret the coordinates, we note that if an observer is able 
to determine the functions a and f for her universe, then she can 
always measure some scalar curvature such as the Ricci scalar or 
the Kretchmann invariant, and since these are proportional to a 
raised to some power, she can determine a and t. This shows that 
t is a “look-out-the-window” time, i.e., a time coordinate that we 
can determine by looking out the window and observing the present 
conditions in the universe. Because the quantity being measured di- 
rectly is a scalar, the result is independent of the observer’s state of 
motion. (In practice, these scalar curvatures are difficult to measure 


5G. F. Smoot, M. V. Gorenstein, and R. A. Muller, “Detection of Anisotropy 
in the Cosmic Blackbody Radiation,” Phys. Rev. Lett. 39 (1977) 898. The 
interpretation of the CMB measurements is somewhat model-dependent; in the 
early years of observational cosmology, it was not even universally accepted that 
the CMB had a cosmological origin. The best model-independent limit on the 
rotation of the universe comes from observations of the solar system, Clemence, 
“Astronomical Time,” Rev. Mod. Phys. 29 (1957) 2. 
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directly, so we measure something else, like the sky-wide average 
temperature of the cosmic microwave background.) Simultaneity is 
supposed to be ill-defined in relativity, but the look-out-the-window 
time defines a notion of simultaneity that is the most naturally in- 
teresting one in this spacetime. With this particular definition of 
simultaneity, we can also define a preferred state of rest at any loca- 
tion in spacetime, which is the one in which ¢ changes as slowly as 
possible relative to one’s own clock. This local rest frame, which is 
more easily determined in practice as the one in which the microwave 
background is most uniform across the sky, can also be interpreted 
as the one that is moving along with the Hubble flow, i.e., the av- 
erage motion of the galaxies, photons, or whatever else inhabits the 
spacetime. The time t is interpreted as the proper time of a particle 
that has always been locally at rest. The spatial distance measured 
by L = f adé is called the proper distance. It is the distance that 
would be measured by a chain of rulers, each of them “at rest” in 
the above sense. 


These coordinates are referred as the “standard” cosmological 
coordinates; one will also encounter other choices, such as the co- 
moving and conformal coordinates, which are more convenient for 
certain purposes. Historically, the solution for the functions a and 
f was found by de Sitter in 1917. 


The spatial metric 


The unknown function f(r) has to give a 3-space metric dé? 
with a constant Einstein curvature tensor. The following Maxima 
program computes the curvature. 


load(ctensor) ; 

dim:3; 

ct_coords: [r, theta, phi] ; 

depends (f,t); 

lg:matrix([f,0,0], 
[0,r72,0], 
[(0,0,r°2*sin(theta)~2]); 

cmetric(); 

einstein(true) ; 


Line 2 tells Maxima that we’re working in a space with three di- 
mensions rather than its default of four. Line 4 tells it that f is a 
function of time. Line 9 uses its built-in function for computing the 
Einstein tensor G*,. The result has only one nonvanishing compo- 
nent, G’, = (1—1/f)/r?. This has to be constant, and since scaling 
can be absorbed in the factor a(t) in the 3+1-dimensional metric, 
we can just set the value of Gy more or less arbitrarily, except for 
its sign. The result is f = 1/(1 —kr?), where k = —1, 0, or 1. 
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The resulting metric, called the Robertson-Walker metric, is 


dr? 
2 442 2 
ds* = dt“ —a ae 


+r7d6? + r? sin? 6 as?) ; 


The form of dé? shows us that k can be interpreted in terms of 
the sign of the spatial curvature. We recognize the k = 0 metric 
as a flat spacetime described in spherical coordinates. To interpret 
the & # 0 cases, we note that a circle at coordinate r has proper 
circumference C = 2rar and proper radius R = a (5 J f(r’) dr’. 
For k < 0, we have f <1 and C > 27R, indicating negative spatial 
curvature. For k > 0 there is positive curvature. 


Let’s examine the positive-curvature case more closely. Suppose 
we select a particular plane of simultaneity defined by t = constant 
and ¢ = 7/2, and we start doing geometry in this plane. In two spa- 
tial dimensions, the Riemann tensor only has a single independent 
component, which can be identified with the Gaussian curvature 
(sec. 5.4, p. 168), and when this Gaussian curvature is positive and 
constant, it can be interpreted as the angular defect of a triangle 
per unit area (sec. 5.3, p. 162). Since the sum of the interior an- 
gles of a triangle can never be greater than 37, we have an upper 
limit on the area of any triangle. This happens because the positive- 
curvature Robertson-Walker metric represents a cosmology that is 
spatially finite. At a given t, it is the three-dimensional analogue 
of a two-sphere. On a two-sphere, if we set up polar coordinates 
with a given point arbitrarily chosen as the origin, then we know 
that the r coordinate must “wrap around” when we get to the an- 
tipodes. That is, there is a coordinate singularity there. (We know 
it can only be a coordinate singularity, because if it wasn’t, then the 
antipodes would have special physical characteristics, but the FRW 
model was constructed to be spatially homogeneous.) This “wrap- 
around” behavior is described by saying that the model is closed. 


In the negative-curvature case, there is no limit on distances, 
b/3. Such a universe is called open. In the case of an open universe, 
it is particularly easy to demonstrate a fact that bothers many stu- 
dents, which is that proper distances can grow at rates exceeding c. 
Let particles A and B both be at rest relative to the Hubble flow. 
The proper distance between them is then given by L = af, where 
— we dé is constant. Then differentiating LZ with respect to the 
look-out-the-window time t gives dL/ dt = a@. In an open universe, 
there is no limit on the size of £, so at any given time, we can make 
dL/ dt as large as we like. This does not violate special relativity, 
since it is only locally that special relativity is a valid approximation 
to general relativity. Because GR only supplies us with frames of 
reference that are local, the velocity of two objects relative to one 
another is not even uniquely defined; our choice of dL/ dt was just 
one of infinitely many possible definitions. 


b/1. In the Euclidean plane, this 
triangle can be scaled by any 
factor while remaining similar to 
itself. 2. In a plane with positive 
curvature, geometrical figures 
have a maximum area and max- 
imum linear dimensions. — This 
triangle has almost the maximum 
area, because the sum of its 
angles is nearly 37. 3. Ina plane 
with negative curvature, figures 
have a maximum area but no 
maximum linear dimensions. This 
triangle has almost the maximum 
area, because the sum of its 
angles is nearly zero. Its vertices, 
however, can still be separated 
from one another without limit. 
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The distinction between closed and open universes is not just 
a matter of geometry, it’s a matter of topology as well. Just as a 
two-sphere cannot be made into a Euclidean plane without cutting 
or tearing, a closed universe is not topologically equivalent to an 
open one. The correlation between local properties (curvature) and 
global ones (topology) is a general theme in differential geometry. 
A universe that is open is open forever, and similarly for a closed 
one. 


The Friedmann equations 


Having fixed f(r), we can now see what the field equation tells 
us about a(t). The next program computes the Einstein tensor for 
the full four-dimensional spacetime: 


load(ctensor) ; 

ct_coords: [t,r,theta, phi] ; 

depends(a,t) ; 

lg:matrix([1,0,0,0], 
[0,-a*2/(1-k*r72) ,0,0], 
[0,0,-a72*r*2,0], 
[0,0,0,-a*2*r*2*sin(theta)~*2]); 

cmetric(); 

einstein(true) ; 
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The result is 


where dots indicate differentiation with respect to time. 


Since we have G®, with mixed upper and lower indices, we either 
have to convert it into Gg,, or write out the field equations in this 
mixed form. The latter turns out to be simpler. In terms of mixed 
indices, g%, is always simply diag(1,1,1,1). Arbitrarily singling out 
r = 0 for simplicity, we have g = diag(1,—a?,0,0). The stress- 
energy tensor is T“,, = diag(p,—P,—P,—P). (See example 4 on 
p. 304 for the signs.) Substituting into G%, = 87T%, + Ag",, we find 


ee 
3 (=) + 3ka~* — A = 82p 


p24 (2) Lhe 2 = NS eae 


a 


Rearranging a little, we have a set of differential equations known 
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as the Friedmann equations, 


a 1 An 

as 3P 
F 5 3 (e+ ) 
. 2 
a i 81 

=A ka~? 
(5) gg 


The cosmology that results from a solution of these differential 
equations is known as the Friedmann-Robertson-Walker (FRW) or 
Friedmann-Lemaitre-Robertson- Walker (FLRW) cosmology. 


The first Friedmann equation describes the rate at which cosmo- 
logical expansion accelerates or decelerates. Let’s refer to it as the 
acceleration equation. It expresses the basic idea of the field equa- 
tions, which is that non-tidal curvature (left-hand side) is caused 
by the matter that is present locally (right-hand side). Example 15 
illustrates this in a simple case. 


The second Friedmann equation tells us the magnitude of the 
rate of expansion or contraction. Call it the velocity equation. The 
quantity a@/a, evaluated at the present cosmological time, is the 
Hubble constant H, (which is constant only in the sense that at a 
fixed time, it is a constant of proportionality between distance and 
recession velocity). 


To the practiced eye, it seems odd to have two dynamical laws, 
one predicting velocity and one acceleration. The analogous laws in 
freshman mechanics would be Newton’s second law, which predicts 
acceleration, and conservation of energy, which predicts velocity. 
Newton’s laws and conservation of energy are not independent, and 
for mechanical systems either can be derived from the other. The 
Friedmann equations, however, are not overdetermined or redun- 
dant. They are underdetermined, because we want to predict three 
unknown functions of time: a, p, and P. Since there are only two 
equations, they are not sufficient to uniquely determine a solution 
for all three functions. The third constraint comes in the form of 
some type of equation of state for the matter described by p and P, 
which in simple models can often be written in the form P = wp. 
For example, dust has w = 0. 


c/ Alexander Friedmann (1888- 
1925). 


Unlike a, p, and P, the cosmological constant A is not free to 
vary with time; if it did, then the stress-energy tensor would have a 
nonvanishing divergence, which is not consistent with the Einstein 
field equations (see p. 321). 


Although general relativity does not provide any scalar, globally 
conserved measure of mass-energy that is conserved in all space- 
times, the Friedmann velocity equation can be loosely interpreted 
as a statement of conservation of mass-energy in an FRW spacetime. 
The left-hand side acts like kinetic energy. In a cosmology that ex- 
pands and then recontracts in a Big Crunch, the turn-around point 
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d/ Example 15. 


is defined by the time at which the right-hand side equals zero. The 
origin of the velocity equation is in fact the time-time part of the 
field equations, whose source term is the mass-energy component of 
the stress-energy tensor. 


Scooping out a hole Example: 15 
This example illustrates the connection between cosmological 
acceleration and local density of matter given by the Friedmann 
acceleration equation. Consider two cosmologies, each with A = 
0. Cosmology 1 is an FRW spacetime in which all matter is in the 
form of nonrelativistic particles such as atoms or galaxies. 2 is 
identical to 1, except that all the matter has been scooped out of a 
small spherical region S, leaving a vacuum. (“Small” means small 
compared to the Hubble scale 1/H,.) Within S, we introduce test 
particles A and B. Because an FRW spacetime is homogeneous 
and isotropic, cosmology 2 retains spherical symmetry about the 
center of S. Since A = 0, Birkhoff’s theorem applies to 2, so 2 
is flat inside S.!° Therefore in 2, the relative acceleration a of the 
test particles equals zero. 


Because S is small compared to cosmological distances, and be- 
cause the dust is nonrelativistic, local observers can accurately 
attibute the difference in behavior between 1 and 2 to the Newto- 
nian gravitational force from the dust that was present in 1 but not 
in 2. For convenience, let A and B both be initially at rest relative 
to the local dust (i.e., having 6 = @ = 0). By the definition of the 
scale factor (i.e., by inspection of the FRW metric), the distance 
between them varies as const x a(t). If one of these particles is 
an observer, she sees a “force” acting on the other particle that 
causes an acceleration (a/a)r, where r is the displacement be- 
tween the particles. 


Since a = 0 in 2, it follows that the acceleration in 1 can be calcu- 
lated accurately by finding the Newtonian gravitational force due 
to the added dust. This results in a connection between 4/a, on 
the left-hand side of the Friedmann acceleration equation, and op, 
on the right side. 


For consistency, we can verify that the Newtonian gravitational 
force exerted by a uniform sphere, at a point on its interior, is 
proportional to r. This is a classic result that is easily derived 
from Newton’s shell theorem. 


'6People sometimes incorrectly overstate this conclusion about the gravity 
inside a hole according to general relativity. In the case of a spherical shell 
of mass in an otherwise empty universe, it is true that the spacetime inside 
is flat, but there is time dilation inside the shell compared to time at infinity, 
and the Schwarzschild coordinates cannot be used inside the shell if they are to 
match up with Schwarzschild coordinates outside the shell. See Zhang and Yi, 
arxiv.org/abs/1203.4428. 
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8.2.5 A singularity at the Big Bang 


The Friedmann equations only allow a constant a in the case 
where A is perfectly tuned relative to the other parameters, and 
even this artificially fine-tuned equilibrium turns out to be unsta- 
ble. These considerations make a static cosmology implausible on 
theoretical grounds, and they are also consistent with the observed 
Hubble expansion (p. 323). 


Since the universe is not static, what happens if we use general 
relativity to extrapolate farther and farther back in time? 


If we extrapolate the Friedmann equations backward in time, we 
find that they always have a = 0 at some point in the past, and this 
occurs regardless of the details of what we assume about the mat- 
ter and radiation that fills the universe. To see this, note that, as 
discussed in example 14 on page 132, radiation is expected to dom- 
inate the early universe, for generic reasons that are not sensitive 
to the (substantial) observational uncertainties about the universe’s 
present-day mixture of ingredients. Under radiation-dominated con- 
ditions, we can approximate A = 0 and P = p/3 (example 14, p. 132) 
in the first Friedmann equation, finding 

a 81 


3? 


where p is the density of mass-energy due to radiation. Since d/a 
is always negative, the graph of a(t) is always concave down, and 
since a is currently increasing, there must be some time in the past 
when a = 0. One can readily verify that this is not just a coordi- 
nate singularity; the Ricci scalar curvature R®, diverges, and the 
singularity occurs at a finite proper time in the past. 


In section 6.3.1, we saw that a black hole contains a singularity, 
but it appears that such singularities are always hidden behind event 
horizons, so that we can never observe them from the outside. The 
FRW singularity, however, is not hidden behind an event horizon. 
It lies in our past light-cone, and our own world-lines emerged from 
it. The universe, it seems, originated in a Big Bang, a concept 
that originated with the Belgian Roman Catholic priest Georges 
Lemaitre. 


Self-check: Why is it not correct to think of the Big Bang as an 
explosion that occurred at a specific point in space? 


Does the FRW singularity represent something real about our 
universe? 


One thing to worry about is the accuracy of our physical model- 
ing of the radiation-dominated universe. The presence of an initial 
singularity in the FRW solutions does not depend sensitively on on 
assumptions like P = p/3, but it is still disquieting that no labo- 
ratory experiment has ever come close to attaining the conditions 


Lemaitre 


e / Georges 
1966) proposed in 1927 that our 
universe be modeled in general 
relativity as a spacetime in which 


(1894- 


space expanded over time. 
Lemaitre’s ideas were _ initially 
treated skeptically by Eddington 
and Einstein, who told him, 
“Your calculations are correct, 
but your physics is abominable.” 
Later, as Hubble’s observational 
evidence for cosmological expan- 
sion became widely accepted, 
both Einstein and Eddington 
became converts, helping to 
bring Lemaitre’s ideas to the 
attention of the community. In 
1931, an emboldened Lemaitre 
described the idea that the 
universe began from a “Primeval 
Atom” or “Cosmic Egg.” The 
name that eventually stuck was 
“Big Bang,” coined by Fred Hoyle 
as a derisive term. 
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under which we could test whether a gas of photons produces grav- 
itational fields as predicted by general relativity. We saw on p. 299 
that static electric fields do produce gravitational fields as predicted, 
but this is not the same as an empirical confirmation that elec- 
tromagnetic waves also act as gravitational sources in exactly the 
manner that general relativity claims. We do, however, have a con- 
sistency check in the form of the abundances of nuclei. Calculations 
of nuclear reactions in the early, radiation-dominated universe pre- 
dict certain abundances of hydrogen, helium, and deuterium. In 
particular, the relative abundance of helium and deuterium is a sen- 
sitive test of the relationships among a, ad, and @ predicted by the 
FRW equations, and they confirm these relationships to a precision 
of about 5+ 4%.!" 


An additional concern is whether the Big Bang singularity is 
just a product of the unrealistic assumption of perfect symmetry 
that went into the FRW cosmology. One of the Penrose-Hawking 
singularity theorems proves that it is not.!° This particular sin- 
gularity theorem requires three conditions: (1) the strong energy 
condition holds; (2) there are no closed timelike curves; and (3) a 
trapped surface exists in the past timelike geodesics originating at 
some point. The requirement of a trapped surface can fail if the uni- 
verse is inhomogeneous to > 1074, but observations of the cosmic 
microwave background rule out any inhomogeneity this large (see 
p. 324). The other possible failure of the assumptions is that if the 
cosmological constant is large enough, it violates the strong energy 
equation, and we can have a Big Bounce rather than a Big Bang 
(see p. 346). 


An exceptional case: the Milne universe 


There is still a third loophole in our conclusion that the Big Bang 
singularity must have existed. Consider the special case of the FRW 
analysis, found by Milne in 1932 (long before FRW), in which the 
universe is completely empty, with p = 0 and A = 0. This is of 
course not consistent with the fact that the universe contains stars 
and galaxies, but we might wonder whether it could tell us anything 
interesting as a simplified approximation to a very dilute universe. 
The result is that the scale factor a varies linearly with time (prob- 
lem 3, p. 367). If a is not constant, then there exists a time at which 
a = 0, but this doesn’t turn out to be a real singularity (which isn’t 
surprising, since there is no matter to create gravitational fields). 
Let this universe have a scattering of test particles whose masses 


'Steigman, Ann. Rev. Nucl. Part. Sci. 57 (2007) 463. These tests are 
stated in terms of the Hubble “constant” H = a/a, which is actually varying 
over cosmological time-scales. The nuclear helium-deuterium ratio is sensitive 
to H/H. 

'SHawking and Ellis, “The Cosmic Black-Body Radiation and the Existence of 
Singularities in Our Universe,” Astrophysical Journal, 152 (1968) 25. Available 
online at articles.adsabs. harvard. edu. 
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are too small to invalidate the approximation of p = 0, and let 
the test particles be at rest in the (r,0,¢) coordinates. The linear 
dependence of a on t means that these particles simply move in- 
ertially and without any gravitational interactions, spreading apart 
from one another at a constant rate like the raisins in a rising loaf 
of raisin bread. The Friedmann equations require k = —1, so the 
spatial geometry is one of constant negative curvature. 


The Milne universe is in fact flat spacetime described in tricky 
coordinates. The connection can be made as follows. Let a spher- 
ically symmetric cloud of test particles be emitted by an explosion 
that occurs at some arbitrarily chosen event in flat spacetime. Make 
the cloud’s density be nonuniform in a certain specific way, so that 
every observer moving along with a test particle (called a comoving 
observer) sees the same local conditions in his own frame; due to 
Lorentz contraction by a factor yy, this requires that the density be 
proportional to y as described by the observer O who remained at 
the origin. This scenario turns out to be identical to the Milne uni- 
verse under the change of coordinates from spatially flat coordinates 
(T, R) to FRW coordinates (t,r), where t = T’//y is the proper time 
and r = vy. (Cf. problem 12, p. 210.) 


The Milne universe may be useful as an innoculation against 
the common misconception that the Big Bang was an explosion of 
matter spreading out into a preexisting vacuum. Such a description 
seems obviously incompatible with homogeneity, since, for example, 
an observer at the edge of the cloud sees the cloud filling only half 
of the sky. But isn’t this a logical contradiction, since the Milne 
universe does have an explosion into vacuum, and yet it was derived 
as a special case of the FRW analysis, which explicitly assumed ho- 
mogeneity? It is not a contradiction, because a comoving observer 
never actually sees an edge. In the limit as we approach the edge, 
the density of the cloud (as seen by the observer who stayed at 
the origin) approaches infinity, and the Lorentz contraction also ap- 
proaches infinity, so that O considers them to be like Hamlet saying, 
“T could be bounded in a nutshell, and count myself a king of infinite 
space.” This logic only works in the case of the Milne universe. The 
explosion-into-preexisting-vacuum interpretation fails in Big Bang 
cosmologies with p ¥ 0. 


8.2.6 Observability of expansion 
Brooklyn is not expanding! 


The proper interpretation of the expansion of the universe, as 
described by the Friedmann equations, can be tricky. The example 
of the Milne universe encourages us to imagine that the expansion 
would be undetectable, since the Milne universe can be described as 
either expanding or not expanding, depending on the choice of coor- 
dinates. A more general consequence of coordinate-independence is 
that relativity does not pick out any preferred distance scale. That 
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is, if all our meter-sticks expand, and the rest of the universe expands 
as well, we would have no way to detect the expansion. The flaw 
in this reasoning is that the Friedmann equations only describe the 
average behavior of spacetime. As dramatized in the classic Woody 
Allen movie “Annie Hall:” “Well, the universe is everything, and 
if it’s expanding, someday it will break apart and that would be 
the end of everything!” “What has the universe got to do with it? 
Yow’re here in Brooklyn! Brooklyn is not expanding!” 


To organize our thoughts, let’s consider the following hypotheses: 


1. The distance between one galaxy and another increases at the 
rate given by a(t) (assuming the galaxies are sufficiently dis- 
tant from one another that they are not gravitationally bound 
within the same galactic cluster, supercluster, etc.). 


2. The wavelength of a photon increases according to a(t) as it 
travels cosmological distances. 


3. The size of the solar system increases at this rate as well (i.e., 
gravitationally bound systems get bigger, including the earth 
and the Milky Way). 


4. The size of Brooklyn increases at this rate (i.e., electromag- 
netically bound systems get bigger). 


5. The size of a helium nucleus increases at this rate (i.e., systems 
bound by the strong nuclear force get bigger). 


We can imagine that: 


e All the above hypotheses are true. 


e All the above hypotheses are false, and in fact none of these 
sizes increases at all. 


e Some are true and some false. 


If all five hypotheses were true, the expansion would be unde- 
tectable, because all available meter-sticks would be expanding to- 
gether. Likewise if no sizes were increasing, there would be nothing 
to detect. These two possibilities are really the same cosmology, 
described in two different coordinate systems. But the Ricci and 
Einstein tensors were carefully constructed so as to be intrinsic. 
The fact that the expansion affects the Einstein tensor shows that 
it cannot interpreted as a mere coordinate expansion. Specifically, 
suppose someone tells you that the FRW metric can be made into 
a flat metric by a change of coordinates. (I have come across this 
claim on internet forums.) The linear structure of the tensor trans- 
formation equations guarantees that a nonzero tensor can never be 
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made into a zero tensor by a change of coordinates. Since the Ein- 
stein tensor is nonzero for an FRW metric, and zero for a flat metric, 
the claim is false. 


Self-check: The reasoning above implicitly assumed a non-empty 
universe. Convince yourself that it fails in the special case of the 
Milne universe. 


We can now see some of the limitations of a common metaphor 
used to explain cosmic expansion, in which the universe is visual- 
ized as the surface of an expanding balloon. The metaphor correctly 
gets across several ideas: that the Big Bang is not an explosion that 
occurred at a preexisting point in empty space; that hypothesis 1 
above holds; and that the rate of recession of one galaxy relative 
to another is proportional to the distance between them. Neverthe- 
less the metaphor may be misleading, because if we take a laundry 
marker and draw any structure on the balloon, that structure will 
expand at the same rate. But this implies that hypotheses 1-5 all 
hold, which cannot be true. 


Since some of the five hypotheses must be true and some false, 
and we would like to sort out which are which. It should also be 
clear by now that these are not five independent hypotheses. For 
example, we can test empirically whether the ratio of Brooklyn’s 
size to the distances between galaxies changes like a(t), remains 
constant, or changes with some other time dependence, but it is 
only the ratio that is actually observable. 


Empirically, we find that hypotheses 1 and 2 are true (i.e., the 
photon’s wavelength maintains a constant ratio with the intergalac- 
tic distance scale), while 3, 4, and 5 are false. For example, the 
orbits of the planets in our solar system have been measured ex- 
tremely accurately by radar reflection and by signal propagation 
times to space probes, and no expanding trend is detected. 


General-relativistic predictions 


Does general relativity correctly reproduce these observations? 
General relativity is mainly a theory of gravity, so it should be well 
within its domain to explain why the solar system does not ex- 
pand detectably while intergalactic distances do. It is impractical 
to solve the Einstein field equations exactly so as to describe the 
internal structure of all the bodies that occupy the universe: galax- 
ies, superclusters, etc. We can, however, handle simple cases, as 
in example 20 on page 345, where we display an exact solution for 
the case of a universe containing only two things: an isolated black 
hole, and an energy density described by a cosmological constant. 
We find that the characteristic scale of the black hole, i.e., the radius 
of its event horizon, does not increase with time. A fuller treatment 
of these issues is given on p. 350, after some facts about realis- 
tic cosmologies have been established. The result is that although 
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bound systems like the solar system are in some cases predicted to 
expand, the expansion is absurdly small, too small to measure, and 
much smaller than the rate of expansion of the universe in general as 
represented by the scale factor a(t). This agrees with observation. 


It is easy to show that atoms and nuclei do not steadily ex- 
pand over time. because such an expansion would violate either 
the equivalence principle or the basic properties of quantum me- 
chanics. One way of stating the equivalence principle is that the 
local geometry of spacetime is always approximately Lorentzian, so 
that the laws of physics do not depend on one’s position or state of 
motion. Among these laws of physics are the principles of quantum 
mechanics, which imply that an atom or a nucleus has a well-defined 
ground state, with a certain size that depends only on fundamental 
constants such as Planck’s constant and the masses of the particles 
involved. Atoms and nuclei do experience deformation due to gravi- 
tational strains (examples 24-25, p. 351), but these deformations do 
not increase with time, and would only be detectable if cosmological 
expansion were to accelerate radically (example 26, p. 352). 


This is different from the case of a photon traveling across the 
universe. The argument given above fails, because the photon does 
not have a ground state. The photon does expand, and this is 
required by the correspondence principle. If the photon did not ex- 
pand, then its wavelength would remain constant, and this would 
be inconsistent with the classical theory of electromagnetism, which 
predicts a Doppler shift due to the relative motion of the source 
and the observer. One can choose to describe cosmological redshifts 
either as Doppler shifts or as expansions of wavelength due to cos- 
mological expansion. 


A nice way of discussing atoms, nuclei, photons, and solar sys- 
tems all on the same footing is to note that in geometrized units, 
the units of mass and length are the same. Therefore the existence 
of any fundamental massive particle sets a universal length scale, 
one that will be known to any intelligent species anywhere in the 
universe. Since photons are massless, they can’t be used to set a 
universal scale in this way; a photon has a certain mass-energy, but 
that mass-energy can take on any value. Similarly, a solar system 
sets a length scale, but not a universal one; the radius of a planet’s 
orbit can take on any value. A universe without massive fundamen- 
tal particles would be a universe without length measurement. It 
would obey the laws of conformal geometry, in which angles and 
light-cones were the only measures. This is the reason that atoms 
and nuclei, which are made of massive fundamental particles, do not 
expand. 


More than one dimension required 


Another good way of understanding why a photon expands, 
while an atom does not, is to recall that a one-dimensional space 
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can never have any intrinsic curvature. If the expansion of atoms 
were to be detectable, we would need to detect it by comparing 
against some other meter-stick. Let’s suppose that a hydrogen atom 
expands more, while a more tightly bound uranium atom expands 
less, so that over time, we can detect a change in the ratio of the two 
atoms’ sizes. The world-lines of the two atoms are one-dimensional 
curves in spacetime. They are housed in a laboratory, and although 
the laboratory does have some spatial extent, the equivalence prin- 
ciple guarantees that to a good approximation, this small spatial 
extent doesn’t matter. This implies an intrinsic curvature in a one- 
dimensional space, which is mathematically impossible, so we have 
a proof by contradiction that atoms do not expand streadily. 


Now why does this one-dimensionality argument fail for photons 
and galaxies? For a pair of galaxies, it fails because the galaxies are 
not sufficiently close together to allow them both to be covered by 
a single Lorentz frame, and therefore the set of world-lines com- 
prising the observation cannot be approximated well as lying within 
a one-dimensional space. Similar reasoning applies for cosmologi- 
cal redshifts of photons received from distant galaxies. One could 
instead propose flying along in a spaceship next to an electromag- 
netic wave, and monitoring the change in its wavelength while it is 
in flight. All the world-lines involved in such an experiment would 
indeed be confined to a one-dimensional space. The experiment is 
impossible, however, because the measuring apparatus cannot be 
accelerated to the speed of light. In reality, the speed of the light 
wave relative to the measuring apparatus will always equal c, so the 
two world-lines involved in the experiment will diverge, and will not 
be confined to a one-dimensional region of spacetime. 


A cosmic girdle Example: 16 
Since cosmic expansion has no significant effect on Brooklyn, nu- 
clei, and solar systems, we might be tempted to infer that its ef- 
fect on any solid body would also be negligible. To see that this 
is not true, imagine that we live in a closed universe, and the uni- 
verse has a leather belt wrapping around it on a closed spacelike 
geodesic. All parts of the belt are initially at rest relative to the 
local galaxies, and the tension is initially zero everywhere. The 
belt must stretch and eventually break: for if not, then it could 
not remain everywhere at rest with respect to the local galaxies, 
and this would violate the symmetry of the initial conditions, since 
there would be no way to pick the direction in which a certain part 
of the belt should begin accelerating. 


Qstvang’s quasi-metric relativity Example: 17 
Over the years, a variety of theories of gravity have been pro- 
posed as alternatives to general relativity. Some of these, such 
as the Brans-Dicke theory, remain viable, i.e., they are consis- 
tent with all the available experimental data that have been used 
to test general relativity. One of the most important reasons for 
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trying to construct such theories is that it can be impossible to 
interpret tests of general relativity’s predictions unless one also 
possesses a theory that predicts something different. This issue, 
for example, has made it impossible to test Einstein’s century-old 
prediction that gravitational effects propagate at c, since there is 
no viable theory available that predicts any other speed for them 
(see section 9.1). 


Ostvang (arxiv.org/abs/gr-qc/0112025v6) has proposed an al- 
ternative theory of gravity, called quasi-metric relativity, which, 
unlike general relativity, predicts a significant cosmological ex- 
pansion of the solar system, and which is claimed to be able 
to explain the observation of small, unexplained accelerations of 
the Pioneer space probes that remain after all accelerations due 
to known effects have been subtracted (the “Pioneer anomaly”). 
We've seen above that there are a variety of arguments against 
such an expansion of the solar system, and that many of these 
arguments do not require detailed technical calculations but only 
knowledge of certain fundamental principles, such as the struc- 
ture of differential geometry (no intrinsic curvature in one dimen- 
sion), the equivalence principle, and the existence of ground states 
in quantum mechanics. We therefore expect that Ostvang’s the- 
ory, if it is logically self-consistent, will probably violate these as- 
sumptions, but that the violations must be relatively small if the 
theory is claimed to be consistent with existing observations. This 
is in fact the case. The theory violates the strictest form of the 
equivalence principle. 


Over the years, a variety of explanations have been proposed 
for the Pioneer anomaly, including both glamorous ones (a mod- 
ification of the 1/r* law of gravitational forces) and others more 
pedestrian (effects due to outgassing of fuel, radiation pressure 
from sunlight, or infrared radiation originating from the space- 
crafts radioisotope thermoelectric generator). Calculations by lo- 
rio! in 2006-2009 show that if the force law for gravity is modified 
in order to explain the Pioneer anomalies, and if gravity obeys the 
equivalence principle, then the results are inconsistent with the 
observed orbital motion of the satellites of Neptune. This makes 
gravitational explanations unlikely, but does not obviously rule out 
@stvang’s theory, since the theory is not supposed to obey the 
equivalence principle. Ostvang says”° that his theory predicts an 
expansion of ~ 1m/yr in the orbit of Triton’s moon Nereid, which 
is consistent with observation. 


In December 2010, the original discoverers of the effect made 
a statement in the popular press that they had a new analysis, 
which they were preparing to publish in a scientific paper, in which 
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the size of the anomaly would be drastically revised downward, 
with a far greater proportion of the acceleration being accounted 
for by thermal effects. In my opinion this revision, combined with 
the putative effect’s violation of the equivalence principle, make it 
clear that the anomaly is not gravitational. 


Does space expand? 


Finally, the balloon metaphor encourages us to interpret cosmo- 
logical expansion as a phenomenon in which space itself expands, 
or perhaps one in which new space is produced. Does space really 
expand? Without posing the question in terms of more rigorously 
defined, empirically observable quantities, we can’t say yes or no. It 
is merely a matter of which definitions one chooses and which con- 
ceptual framework one finds easier and more natural to work within. 
Bunn and Hogg have stated the minority view against expansion of 
space?!, while the opposite opinion is given by Francis et al.?? 


As an example of a self-consistent set of definitions that lead 
to the conclusion that space does expand, Francis et al. give the 
following. Define eight observers positioned at the corners of a cube, 
at cosmological distances from one another. Let each observer be 
at rest relative to the local matter and radiation that were used as 
ingredients in the FRW cosmology. (For example, we know that our 
own solar system is not at rest in this sense, because we observe 
that the cosmic microwave background radiation is slightly Doppler 
shifted in our frame of reference.) Then these eight observers will 
observe that, over time, the volume of the cube grows as expected 
according to the cube of the function a(t) in the FRW model. 


This establishes that expansion of space is a plausible interpreta- 
tion. To see that it is not the only possible interpretation, consider 
the following example. A photon is observed after having traveled 
to earth from a distant galaxy G, and is found to be red-shifted. Al- 
ice, who likes expansion, will explain this by saying that while the 
photon was in flight, the space it occupied expanded, lengthening 
its wavelength. Betty, who dislikes expansion, wants to interpret it 
as a kinematic red shift, arising from the motion of galaxy G rela- 
tive to the Milky Way Malaxy, M. If Alice and Betty’s disagreement 
is to be decided as a matter of absolute truth, then we need some 
objective method for resolving an observed redshift into two terms, 
one kinematic and one gravitational. But we’ve seen in section 7.4 
on page 278 that this is only possible for a stationary spacetime, 
and cosmological spacetimes are not stationary: regardless of an 
observer’s state of motion, he sees a change over time in observables 
such as density of matter and curvature of spacetime. As an ex- 
treme example, suppose that Betty, in galaxy M, receives a photon 
without realizing that she lives in a closed universe, and the pho- 


*Inttp://arxiv.org/abs/0808.1081v2 
2nttp://arxiv.org/abs/0707 .0380v1 
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ton has made a circuit of the cosmos, having been emitted from her 
own galaxy in the distant past. If she insists on interpreting this as 
a kinematic red shift, the she must conclude that her galaxy M is 
moving at some extremely high velocity relative to itself. This is in 
fact not an impossible interpretation, if we say that M’s high veloc- 
ity is relative to itself in the past. An observer who sets up a frame 
of reference with its origin fixed at galaxy G will happily confirm 
that M has been accelerating over the eons. What this demonstrates 
is that we can split up a cosmological red shift into kinematic and 
gravitational parts in any way we like, depending on our choice of 
coordinate system (see also p. 285). 


A cosmic whip Example: 18 
The cosmic girdle of example 16 on p. 337 does not transmit any 
information from one part of the universe to another, for its state 
is the same everywhere by symmetry, and therefore an observer 
near one part of the belt gets no information that is any different 
from what would be available to an observer anywhere else. 


Now suppose that the universe is open rather than closed, but 
we have a rope that, just like the belt, stretches out over cosmic 
distances along a spacelike geodesic. If the rope is initially at 
rest with respect to a particular galaxy G (or, more strictly speak- 
ing, with respect to the locally averaged cosmic medium), then by 
symmetry the rope will always remain at rest with respect to G, 
since there is no way for the laws of physics to pick a direction in 
which it should accelerate. Now the residents of G cut the rope, 
release half of it, and tie the other half securely to one of G’s spi- 
ral arms using a square knot. If they do this smoothly, without 
varying the rope’s tension, then no vibrations will propagate, and 
everything will be as it was before on that half of the rope. (We 
assume that G is so massive relative to the rope that the rope 
does not cause it to accelerate significantly.) 


Can observers at distant points observe the tail of the rope whip- 
ping by at a certain speed, and thereby infer the velocity of G 
relative to them? This would produce all kinds of strange con- 
clusions. For one thing, the Hubble law says that this velocity is 
directly proportional to the length of the rope, so by making the 
rope long enough we could make this velocity exceed the speed 
of light. We’ve also convinced ourselves that the relative veloc- 
ity of cosmologically distant objects is not even well defined in 
general relativity, so it clearly can’t make sense to interpret the 
rope-end’s velocity in that way. 


The way out of the paradox is to recognize that disturbances can 
only propagate along the rope at a certain speed v. Let’s say that 
the information is transmitted in the form of longitudinal vibrations, 
in which case it propagates at the speed of sound. For a rope 
made out of any known material, this is far less than the speed 
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of light, and we’ve also seen in example 14 on page 64 and in 
problem 4 on page 84 that relativity places fundamental limits 
on the properties of all possible materials, guaranteeing v < c. 
We can now see that all we’ve accomplished with the rope is to 
recapitulate using slower sound waves the discussion that was 
carried out on page 339 using light waves. The sound waves 
may perhaps preserve some information about the state of motion 
of galaxy G long ago, but all the same ambiguities apply to its 
interpretation as in the case of light waves — and in addition, we 
suspect that the rope has long since parted somewhere along its 
length. 


8.2.7 The vacuum-dominated solution 


For 70 years after Hubble’s discovery of cosmological expansion, 
the standard picture was one in which the universe expanded, but 
the expansion must be decelerating. The deceleration is predicted 
by the special cases of the FRW cosmology that were believed to 
be applicable, and even if we didn’t know anything about general 
relativity, it would be reasonable to expect a deceleration due to the 
mutual Newtonian gravitational attraction of all the mass in the 
universe. 


But observations of distant supernovae starting around 1998 in- 
troduced a further twist in the plot. In a binary star system con- 
sisting of a white dwarf and a non-degenerate star, as the non- 
degenerate star evolves into a red giant, its size increases, and it 
can begin dumping mass onto the white dwarf. This can cause the 
white dwarf to exceed the Chandrasekhar limit (page 144), resulting 
in an explosion known as a type Ia supernova. Because the Chan- 
drasekhar limit provides a uniform set of initial conditions, the be- 
havior of type Ia supernovae is fairly predictable, and in particular 
their luminosities are approximately equal. They therefore provide 
a kind of standard candle: since the intrinsic brightness is known, 
the distance can be inferred from the apparent brightness. Given 
the distance, we can infer the time that was spent in transit by the 
light on its way to us, i.e. the look-back time. From measurements 
of Doppler shifts of spectral lines, we can also find the velocity at 
which the supernova was receding from us. The result is that we 
can measure the universe’s rate of expansion as a function of time. 
Observations show that this rate of expansion has been accelerating. 
The Friedmann equations show that this can only occur for A = 4p. 
This picture has been independently verified by measurements of the 
cosmic microwave background (CMB) radiation. A more detailed 
discussion of the supernova and CMB data is given in section 8.2.11 
on page 354. 


With hindsight, we can see that in a quantum-mechanical con- 
text, it is natural to expect that fluctuations of the vacuum, required 
by the Heisenberg uncertainty principle, would contribute to the cos- 
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mological constant, and in fact models tend to overpredict A by a 
factor of about 10!2°! From this point of view, the mystery is why 
these effects cancel out so precisely. A correct understanding of the 
cosmological constant presumably requires a full theory of quantum 
gravity, which is presently far out of our reach. 


The latest data show that our universe, in the present epoch, is 
dominated by the cosmological constant, so as an approximation we 
can write the Friedmann equations as 
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This is referred to as a vacuum-dominated universe or the de Sitter 
spacetime. The solution is 
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where observations show that A ~ 107° kg/m, giving \/3/A ~ 
10!! years. 


The implications for the fate of the universe are depressing. All 
parts of the universe will accelerate away from one another faster and 
faster as time goes on. The relative separation between two objects, 
say galaxy A and galaxy B, will eventually be increasing faster than 
the speed of light. (The Lorentzian character of spacetime is local, 
so relative motion faster than c is only forbidden between objects 
that are passing right by one another.) At this point, an observer 
in either galaxy will say that the other one has passed behind an 
event horizon. If intelligent observers do actually exist in the far 
future, they may have no way to tell that the cosmos even exists. 
They will perceive themselves as living in island universes, such as 
we believed our own galaxy to be a hundred years ago. 


When I introduced the standard cosmological coordinates on 
page 326, I described them as coordinates in which events that 
are simultaneous according to this t are events at which the local 
properties of the universe are the same. In the case of a perfectly 
vacuum-dominated universe, however, this notion loses its meaning. 
The only observable local property of such a universe is the vacuum 
energy described by the cosmological constant, and its density is al- 
ways the same, because it is built into the structure of the vacuum. 
Thus the vacuum-dominated cosmology is a special one that maxi- 
mally symmetric, in the sense that it has not only the symmetries of 
homogeneity and isotropy that we’ve been assuming all along, but 
also a symmetry with respect to time: it is a cosmology without 
history, in which all times appear identical to a local observer. One 
way of checking this claim is by calculating curvature scalars, and 
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we find, for example, that the Ricci scalar is a constant R = —12A 
(with the sign depending on the + — —— signature, example 26, 
p. 206). 


In the special case of this cosmology, the time variation of the 
scaling factor a(t) is unobservable, and may be thought of as the 
unfortunate result of choosing an inappropriate set of coordinates, 
which obscure the underlying symmetry. When I argued in section 
8.2.6 for the observability of the universe’s expansion, note that all 
my arguments assumed the presence of matter or radiation. These 
are completely absent in a perfectly vacuum-dominated cosmology. 


For these reasons de Sitter originally proposed this solution as a 
static universe in 1927. But by 1920 it was realized that this was an 
oversimplification. The argument above only shows that the time 
variation of a(t) does not allow us to distinguish one epoch of the 
universe from another. That is, we can’t look out the window and 
infer the date (e.g., from the temperature of the cosmic microwave 
background radiation). It does not, however, imply that the uni- 
verse is static in the sense that had been assumed until Hubble’s 
observations. The r-t part of the metric is 


ds? = dt? — a? dr?, 


where a blows up exponentially with time, and the k-dependence 
has been neglected, as it was in the approximation to the Friedmann 
equations used to derive a(t).?? Let a test particle travel in the radial 
direction, starting at event A = (0,0) and ending at B = (#’,r’). In 
flat space, a world-line of the linear form r = vt would be a geodesic 
connecting A and B; it would maximize the particle’s proper time. 
But in the this metric, it cannot be a geodesic. The curvature of 
geodesics relative to a line on an r-t plot is most easily understood 
in the limit where ?t’ is fairly long compared to the time-scale T = 
\/3/A of the exponential, so that a(t’) is huge. The particle’s best 
strategy for maximizing its proper time is to make sure that its dr 
is extremely small when a is extremely large. The geodesic must 
therefore have nearly constant r at the end. This makes it sound as 
though the particle was decelerating, but in fact the opposite is true. 
If r is constant, then the particle’s spacelike distance from the origin 
is just ra(t), which blows up exponentially. The near-constancy of 
the coordinate r at large t actually means that the particle’s motion 
at large ¢ isn’t really due to the particle’s inertial memory of its 
original motion, as in Newton’s first law. What happens instead 
is that the particle’s initial motion allows it to move some distance 


3A computation of the Einstein tensor with ds? = dt? — a?(1 — kr?)71 dr? 
shows that k enters only via a factor the form (...)e")’+(...)k. For large t, the 
k term becomes negligible, and the Einstein tensor becomes G*, = g*,A, This is 
consistent with the approximation we used in deriving the solution, which was 
to ignore both the source terms and the k term in the Friedmann equations. 
The exact solutions with A > 0 and k = —1, 0, and 1 turn out in fact to be 
equivalent except for a change of coordinates. 
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away from the origin during a time on the order of T,, but after that, 
the expansion of the universe has become so rapid that the particle’s 
motion simply streams outward because of the expansion of space 
itself. Its initial motion only mattered because it determined how 
far out the particle got before being swept away by the exponential 
expansion. 


Geodesics in a vacuum-dominated universe Example: 19 
In this example we confirm the above interpretation in the special 
case where the particle, rather than being released in motion at 
the origin, is released at some nonzero radius r, with dr/dt = 0 
initially. First we recall the geodesic equation 


Px; dxidx* 

daz JK dy dA” 
from page 179. The nonvanishing Christoffel symbols for the 1+1- 
dimensional metric ds? = dt? — a* dr? are T,, = a/aandTt, = aa. 
Setting T = 1 for convenience, we have I’, = 1 and Tt, = e72!, 
We conjecture that the particle remains at the same value of r. 
Given this conjecture, the particle’s proper time [ ds is simply the 


same as its time coordinate t, and we can therefore use ¢ as an 
affine coordinate. Letting \ = t, we have 


ft aaa 
dt? TT Mogdepi 2 = 


o—r'.f =0 
r=0 
r = constant 


This confirms the self-consistency of the conjecture that r = constant 
is a geodesic. 


Note that we never actually had to use the actual expressions for 
the Christoffel symbols; we only needed to know which of them 
vanished and which didn’t. The conclusion depended only on the 
fact that the metric had the form ds* = dt® — a*dr? for some 
function a(t). This provides a rigorous justification for the inter- 
pretation of the cosmological scale factor a as giving a universal 
time-variation on all distance scales. 


The calculation also confirms that there is nothing special about 
r= 0. A particle released with r = 0 and r = 0 initially stays at 
r = 0, but a particle released at any other value of r also stays 
at that r. This cosmology is homogeneous, so any point could 
have been chosen as r = 0. If we sprinkle test particles, all at 
rest, across the surface of a sphere centered on this arbitrarily 
chosen point, then they will all accelerate outward relative to one 
another, and the volume of the sphere will increase. This is ex- 
actly what we expect. The Ricci curvature is interpreted as the 
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second derivative of the volume of a region of space defined by 
test particles in this way. The fact that the second derivative is 
positive rather than negative tells us that we are observing the 
kind of repulsion provided by the cosmological constant, not the 
attraction that results from the existence of material sources. 


Schwarzschild-de Sitter space Example: 20 
The metric 
2m 1 dr? 
2_f4_ <M 1, 2\ ae Band) oo te 2 
ds (1 - ar) at 12 tap r- d0“—r* sin® 8d 


is an exact solution to the Einstein field equations with cosmo- 
logical constant A, and can be interpreted as a universe in which 
the only mass is a black hole of mass m located at r = 0. Near 
the black hole, the A terms become negligible, and this is simply 
the Schwarzschild metric. As argued in section 8.2.6, page 333, 
this is a simple example of how cosmological expansion does not 
cause all structures in the universe to grow at the same rate. 


Conservation of energy-momentum Example: 21 
Suppose that we assume the de Sitter geometry, and ask what 
type of matter fields are necessary to create it. We know that a 
cosmological constant will do the job, but could we have some 
other matter field that would also work? Suppose that the matter 
field is constrained to be a perfect fluid. The total stress-energy 
is then of the form T/ = diag(p, —P, —P, —P) in Cartesian coordi- 
nates. (See example 4 on p. 304 for the signs, some of which de- 
pend on our use of the + — —— signature.) The divergence V,,7;" 
measures the rate at which an observer says energy is being cre- 
ated, and we need this to be zero. This expression is one of those 
tricky examples where the covariant derivative can be nonzero 
even when the thing being differentiated vanishes identically. The 
divergence is V;T', + Vx 7%, and the term that doesn’t vanish is 
the second one, even though 7*, = 0. Using the nonvanishing 
Christoffel symbols this becomes [%,T! — TXT, = 4(p + P), so 
that p + P = 0. This condition is satisfied by a cosmological con- 
stant. Our result is that the only way to get a de Sitter geometry 
is with matter fields that exactly mimic a cosmological constant. 
This is of some historical interest in the context of the steady-state 
cosmologies, section 8.4, p. 363. It may seem mysterious that 
we have obtained this result by requiring conservation of energy- 
momentum, but we could also have done it using the Einstein field 
equations. In fact these are not two separate requirements, since 
the field equations require conservation of energy-momentum in 
order to be consistent. 
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The Big Bang singularity in a universe with a cosmological 
constant 


On page 332 we discussed the possibility that the Big Bang 
singularity was an artifact of the unrealistically perfect symmetry 
assumed by our cosmological models, and we found that this was 
not the case: the Penrose-Hawking singularity theorems demon- 
strate that the singularity is real, provided that the cosmological 
constant is zero. The cosmological constant is not zero, however. 
Models with a very large positive cosmological constant can also 
display a Big Bounce rather than a Big Bang. If we imagine us- 
ing the Friedmann equations to evolve the universe backward in 
time from its present state, the scaling arguments of example 14 on 
page 132 suggest that at early enough times, radiation and mat- 
ter should dominate over the cosmological constant. For a large 
enough value of the cosmological constant, however, it can happen 
that this switch-over never happens. In such a model, the universe 
is and always has been dominated by the cosmological constant, 
and we get a Big Bounce in the past because of the cosmological 
constant’s repulsion. In this book I will only develop simple cos- 
mological models in which the universe is dominated by a single 
component; for a discussion of bouncing models with both matter 
and a cosmological constant, see Carroll, “The Cosmological Con- 
stant,” http://www.livingreviews.org/lrr-2001-1. By 2008, a 
variety of observational data had pinned down the cosmological con- 
stant well enough to rule out the possibility of a bounce caused by 
a very strong cosmological constant. 


8.2.8 The matter-dominated solution 


Our universe is not perfectly vacuum-dominated, and in the past 
it was even less so. Let us consider the matter-dominated epoch, 
in which the cosmological constant was negligible compared to the 
material sources. The equation of state for nonrelativistic matter 
(p. 132) is 

P=0. 
The dilution of the dust with cosmological expansion gives 


pxa? 


(see example 23). The Friedmann equations become 
a 4 


where for compactness p’s dependence on a, with some constant 
of proportionality, is not shown explicitly. A static solution, with 
constant a, is impossible, and @ is negative, which we can interpret 
in Newtonian terms as the deceleration of the matter in the universe 
due to gravitational attraction. There are three cases to consider, 
according to the value of k. 
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The closed universe 


We've seen that / = +1 describes a universe in which the spatial 
curvature is positive, i.e., the circumference of a circle is less than 
its Euclidean value. By analogy with a sphere, which is the two- 
dimensional surface of constant positive curvature, we expect that 
the total volume of this universe is finite. 


The second Friedmann equation also shows us that at some value 
of a, we will have a = 0. The universe will expand, stop, and then 
recollapse, eventually coming back together in a “Big Crunch” which 
is the time-reversed version of the Big Bang. 


Suppose we were to describe an initial-value problem in this 
cosmology, in which the initial conditions are given for all points in 
the universe on some spacelike surface, say t = constant. Since the 
universe is assumed to be homogeneous at all times, there are really 
only three numbers to specify, a, a, and p: how big is the universe, 
how fast is it expanding, and how much matter is in it? But these 
three pieces of data may or may not be consistent with the second 
Friedmann equation. That is, the problem is overdetermined. In 
particular, we can see that for small enough values of p, we do 
not have a valid solution, since the square of a@/a would have to be 
negative. Thus a closed universe requires a certain amount of matter 
in it. The present observational evidence (from supernovae and the 
cosmic microwave background, as described above) is sufficient to 
show that our universe does not contain this much matter. 


The flat universe 


The case of k = 0 describes a universe that is spatially flat. 
It represents a knife-edge case lying between the closed and open 
universes. In a Newtonian analogy, it represents the case in which 
the universe is moving exactly at escape velocity; as t approaches 
infinity, we have a > co, p > 0, and ad > 0. This case, unlike the 
others, allows an easy closed-form solution to the motion. Let the 
constant of proportionality in the equation of state p x a~? be fixed 
by setting —4ap/3 = —ca~*. The Friedmann equations are 


a=—ca_ 


a= V2caV/?, 


Looking for a solution of the form a « t?, we find that by choosing 
p = 2/3 we can simultaneously satisfy both equations. The constant 
c is also fixed, and we can investigate this most transparently by 
recognizing that a/a is interpreted as the Hubble constant, H, which 
is the constant of proportionality relating a far-off galaxy’s velocity 
to its distance. Note that H is a “constant” in the sense that it is 
the same for all galaxies, in this particular model with a vanishing 
cosmological constant; it does not stay constant with the passage 
of cosmological time. Plugging back into the original form of the 
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Friedmann equations, we find that the flat universe can only exist if 
the density of matter satisfies p = Perit = 3H?/8m = 3H?/87G. The 
observed value of the Hubble constant is about 1/(14 x 10° years), 
which is roughly interpreted as the age of the universe, i.e., the 
proper time experienced by a test particle since the Big Bang. This 
gives Perit ~ 10776 kg/m. 


As discussed in subsection 8.2.11, our universe turns out to 
be almost exactly spatially flat. Although it is presently vacuum- 
dominated, the flat and matter-dominated FRW cosmology is a use- 
ful description of its matter-dominated era. 


The open universe 


The k = —1 case represents a universe that has negative spatial 
curvature, is spatially infinite, and is also infinite in time, i.e., even 
if the cosmological constant had been zero, the expansion of the uni- 
verse would have had too little matter in it to cause it to recontract 
and end in a Big Crunch. 


The time-reversal symmetry of general relativity was discussed 
on p. 223 in connection with the Schwarzschild metric.74 Because 
of this symmetry, we expect that solutions to the field equations 
will be symmetric under time reversal (unless asymmetric boundary 
conditions were imposed). The closed universe has exactly this type 
of time-reversal symmetry. But the open universe clearly breaks this 
symmetry, and this is why we speak of the Big Bang as lying in the 
past, not in the future. This is an example of spontaneous symmetry 
breaking. Spontaneous symmetry breaking happens when we try to 
balance a pencil on its tip, and it is also an important phenomenon 
in particle physics. The time-reversed version of the open universe 
is an equally valid solution of the field equations. Another example 
of spontaneous symmetry breaking in cosmological solutions is that 
the solutions have a preferred frame of reference, which is the one at 
rest relative to the cosmic microwave background and the average 
motion of the galaxies. This is referred to as the Hubble flow. 


Size and age of the observable universe Example: 22 

The observable universe is defined by the region from which 
light has had time to reach us since the Big Bang. Many people 
are inclined to assume that its radius in units of light-years must 
therefore be equal to the age of the universe expressed in years. 
This is not true. Cosmological distances like these are not even 
uniquely defined, because general relativity only has local frames 
of reference, not global ones. 


Suppose we adopt the proper distance L defined on p. 326 as 
our measure of radius. By this measure, realistic cosmological 
models say that our 14-billion-year-old universe has a radius of 


?4Problem 5 on p. 367 shows that this symmetry is also exhibited by the 
Friedmann equations. 
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46 billion light years. 


For a flat universe, f = 1, so by inspecting the FRW metric we 
find that a photon moving radially with ds = 0 has |dr/dt| = 
a', giving r = + [Pdt/a. Suppressing signs, the proper dis- 
tance the photon traverses starting soon after the Big Bang is 
L = alts) [ dl = alts) [ dr = a(te)r = alte) fi? dt/a. 


In the matter-dominated case, a « t?/%, so this results in L = 3b 
in the limit where t, is small. Our universe has spent most of 
its history being matter-dominated, so it’s encouraging that the 
matter-dominated calculation seems to do a pretty good job of 
reproducing the actual ratio of 46/14=3.3 between L and b. 


While we're at it, we can see what happens in the purely vacuum- 
dominated case, which has a « e!/’, where T = ,/3/A. This 
cosmology doesn’t have a Big Bang, but we can think of it as an 
approximation to the more recent history of the universe, glued 
on to an earlier matter-dominated solution. Here we find L = 
[e(2-4)/T _ 1] T, where t, is the time when the switch to vacuum- 
domination happened. This function grows more quickly with t. 
than the one obtained in the matter-dominated case, so it makes 
sense that the real-world ratio of L/t2 is somewhat greater than 
the matter-dominated value of 3. 


The radiation-dominated version is handled in problem 12 on p. 368. 


Local conservation of mass-energy Example: 23 
Any solution to the Friedmann equations is a solution of the field 


equations, and therefore locally conserves mass-energy. We saved 


work above by applying this condition in advance in the form 
0 «x a? to make the dust dilute itself properly with cosmologi- 
cal expansion. In this example we prove the same proportionality 
by explicit calculation. 


Local conservation of mass-energy is expressed by the zero di- 
vergence of the stress-energy tensor, V; T/© = 0. The definition of 
the covariant derivative gives 


Wil So engl baie 


For convenience, we carry out the calculation at r = 0; if conser- 
vation holds here, then it holds everywhere by homogeneity. 


In a local Cartesian frame (t’, x’, y’, z’) at rest relative to the dust, 
the stress-energy tensor is diagonal with T’’ = p. Atr = 0, 
the transformation from FLRW coordinates into these coordinates 
doesn’t mix t or t’ with the other coordinates, so by the tensor 
transformation law we still have T“ = p. 


There are a number of Christoffel symbols involved, but the only 
three of relevance that don’t vanish at r = 0 turn out to be IY, = 
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ro. =}, = a/a. The result is 
t tt a tt 
Vit =o F +377 
or p/p = —3a/a, which can be rewritten as 


| d 
oe = 3 | 
qq ite 34 in: 


producing the proportionality originally claimed. 


8.2.9 The radiation-dominated solution 


For the reasons discussed in example 14 on page 132, the early 
universe was dominated by radiation. The solution of the Friedmann 
equations for this case is taken up in problem 11 on page 368. 


8.2.10 Local effects of expansion 


In this section we discuss the predictions of general relativity 
concerning the effect of cosmological expansion on small, gravita- 
tionally bound systems such as the solar system or clusters of galax- 
ies. The short answer is that in most realistic cosmologies (but not 
necessarily in “Big Rip” scenarios, p. 352) the effect of expansion 
is not zero, but is many orders of magnitude too small to measure. 
Many readers will probably be willing to accept these assertions 
while skipping the following demonstrations. 


To begin with, we observe that there are two qualitatively dis- 
tinct types of effects that could exist. Suppose that a loaf of raisin 
bread is rising. Let’s say that the loaf’s scale factor a doubles by 
the time the yeast’s efforts are spent. By definition, this means that 
the raisins (galaxies, test particles) get farther apart by a factor of 
2. We could imagine that in addition: (1) the strain of expansion 
could cause each raisin to puff up by, say, 1%, and to maintain this 
increased size over the entire course of expansion; or that (2) expan- 
sion could could cause each raisin to expand gradually, to 0.2% more 
than its original size, then 0.4% more than its original size, and so 
on, until, at the end of the process, each had grown beyond its orig- 
inal size by some amount such as 3.8%, which, while less than the 
100% growth of the inter-raisin distances, was nevertheless nonzero. 
Astronomers refer to the second possibility as a “secular” trend. For 
example, simulations of solar systems often show that over billions 
of years, planets gradually migrate either inward or outward, under 
the influence of their gravitational interactions with other planets. 
As an example of an expansion without a secular trend, asteroids 
may experience a nonnegligible 1/r? force due to radiation pressure 
from the sun. The effect is exactly as if the sun’s mass or the gravi- 
tational constant had been slightly reduced. Kepler’s elliptical orbit 
law holds, the law of periods is slightly off, and the orbital radius 
shows zero trend over time. 
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If either type of effect exists, an observer in some local inertial 
frame will interpret it as a “force.” (The scare quotes are a reminder 
that general relativity doesn’t describe gravity as Newton-style lin- 
early additive, instantaneous action at a distance.) Such a force, 
if it exists, cannot simply be proportional to the rate of expansion 
a/a. As a counterexample, the Milne universe is just flat spacetime 
described in silly coordinates, and it has a 4 0. 


It would make more sense for the force to depend on the second 
derivative of the scale factor. To justify this more precisely, imagine 
releasing two test particles, initially separated by some distance that 
is much less than the Hubble scale. They are initially at rest relative 
to the Hubble flow, and no locally gravitating bodies are present. 
As discussed in example 15 on p. 330, the acceleration of one test 
particle relative to the other is given by (G@/a)r, where r is their 
relative displacement. 


Thus if we are to observe any nonzero effects of expansion on a 
local system, they are not really effects of expansion at all, but effects 
of the acceleration of expansion. The factor G@/a is on the order of 
the inverse square of the age of the universe, i.c., H2 ~ 107° s~?. 
The smallness of this factor is what makes the effect on a system 
such as the solar system so absurdly tiny. 


A human body Example: 24 
Let’s estimate the effect of cosmological expansion on the length 
L of your thigh bone. The body is made of atoms, and for the rea- 
sons given on p. 336, there can be no steady trend in the sizes of 
these atoms or the lengths of the chemical bonds between them. 
The bone experiences a stress due to cosmological expansion, 
but it is in equilibrium, and the strain will disappear if the gravi- 
tational stress is removed (e.g., if other gravitational stresses are 
superimposed on top of the cosmological one in order to cancel 
it). The anomalous acceleration between the ends of the bone 
is (a/a)L, which is observed as an anomalous stress. Taking 
4/a ~ He’, the anomalous acceleration of one end of the bone 
relative to the other is ~ LH?. The corresponding compression 
or tension is ~ mLH?, where mis your body mass. The resulting 
strain is e ~ mLH?/AE, where E is the Young’s modulus of bone 
(about 101° Pa) and A is the bone’s cross-sectional area. 


Putting in numbers, the result for the strain is about 10~4°, which 
is much too small to be measurable by any imaginable technique, 
and would in reality be swamped by other effects. Since the sign 
of ais currently positive, this strain is tensile, not compressive. In 
the earlier, matter-dominated era of the universe, it would have 
been compressive. 


There is no “secular trend,” i.e., your leg bone is not expanding 
over time. It’s in equilibrium, and is simply elongated impercepti- 
bly compared to the length if would have had without the effect of 
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cosmological expanson. 


Strain on an atomic nucleus Example: 25 
The estimate in example 24 can also be applied to an atomic 
nucleus, which has a nuclear “Young’s modulus” on the order of 
1 MeV /fm? ~ 10° Pa. The result is a strain e ~ 10-*2. 


A Big Rip Example: 26 

Known forms of matter are believed to have equations of state 
P = wp with w > —1. The value for a vacuum-dominated uni- 
verse would be w = —1. Cosmological observations”> show that 
empirically the present-day universe behaves as if it is made out 
of stuff with w = —1.03 + .16, and this leaves open the possibility 
of w < —1. In this case, the solution to the Friedmann equations 
gives a scale factor a(t) that blows up to infinity at some finite f. In 
such a scenario, Known as a “Big Rip,” (d/dt)(a/a) diverges, and 
any system, no matter how tightly bound, is ripped apart.?° The 
vacuum energy responsible for such behavior is referred to as 
“phantom energy,’ and as of 2019 there is some evidence (p. 357) 
to support its existence due to discrepancies in the value of the 
Hubble constant. 


Examples 24-26 show that except under hypothetical extreme 
cosmological conditions, there is no hope of detecting any effect of 
cosmological expansion on systems made of condensed matter. We 
need to look at much larger systems to see any effect, and such 
systems are held together by gravity. For concreteness, let’s keep 
talking about the earth-sun system. Not only is the anomalous 
force on the earth small, it is not guaranteed to produce any secular 
trend, which is what would be most likely to be detectable. The 
direction of the anomalous force on the earth is outward for an 
accelerating cosmological expansion, as we now know is the case for 
the present epoch. As an example in which no secular trend occurs, a 
vacuum-dominated cosmology gives a constant value for G@/a, so the 
outward force is constant. As with the effect of radiation pressure, 
the existence of this constant, outward force is very nearly equivalent 
to rescaling the sun’s gravitational force by a tiny amount, so the 
motion is still very nearly Keplerian, but with a slightly “wrong” 
constant of proportionality in Kepler’s law of periods. The rate of 
change 7 in the radius of the circular orbit is therefore zero in this 
case. 


But in most cosmologies d/a is not exactly constant, and the 
anomalous force on the earth varies. In a matter-dominated cos- 
mology with A = 0, in its expanding phase, the force is inward 
but decreasing over time, so the orbit expands over time. What 
really matters then, is (d/dt)(@/a). If we were free to pick any 


?°Carnero et al., arxiv.org/abs/1104.5426 
6 Caldwell et al., arxiv. org/abs/astro-ph/0302506 
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function for a(t), we could make up examples in which a > 0 but 
(d/ dt)(a/a) < 0, so that the solar system would respond to cosmo- 
logical expansion by shrinking! 


The function a(t), however, has to satisfy the Friedmann equa- 
tions, one of which is (in units with G # 1) 


At 
3 


a 1 

=GI|-A 3P)]|. 
“= [5A-Zop+3P)] 
The present epoch of the universe seems to be well modeled by dark 
energy described by a constant A plus dust with P < p. Differenti- 
ating both sides with respect to time gives 


@(@ ., 
dial 2 


with a negative constant of proportionality. This ensures that the 
sign of the effect is always as expected from the naive Manichean im- 
age of binding forces struggling against cosmological expansion (or 
perhaps cooperating during the contracting phase of a Big Crunch 
cosmology). 


One way of understanding why this reduces so nicely to a de- 
pendence on # is the reasoning given in example 15 on p. 330, 
in which we found that the relative acceleration of two test par- 
ticles A and B in a matter-dominated FRW cosmology could be 
calculated accurately by pretending that it was due to the pres- 
ence of the dust in any given sphere S surrounding the two par- 
ticles. We now let A be the sun, B the earth, and S a sphere 
centered on the sun whose radius equals the radius of the earth’s 
circular orbit. Due to cosmological expansion, the dust inside S 
thins out with time, reducing its density p. Applying Newton’s 
laws to the orbit of the earth gives w?r = GM/r?, and conserva- 
tion of angular momentum results in wr? = const. A calculation 
gives r/ro = [M + (42/3) por3]/[M + (47/3)pr3], which results in 
r/ro & —(4r/3)Gw5p. Application of the Friedmann equations 
yields 

#/1o = wy (d/dt)(a/a), 
which is valid generally, not just for P = 0. The w>? factor shows 
that the effect is smaller for more tightly bound systems. 


We know that the universe in the present era has (d/dt)(a/a) > 0 
because 6 < 0, and for purposes of an order-of-magnitude estimate 
we can take (d/dt)(@/a) ~ H3. Plugging in numbers for the earth- 
sun system, we find that since the age of the dinosaurs, the radius of 
the earth’s orbit has grown by less than the diameter of an atomic 
nucleus.?” 


27The picturesque image comes from Cooperstock et al., http://arxiv.org/ 
abs/astro-ph/9803097v1, who give a different calculation leading to a result for 
r exactly equivalent to the one derived here. 
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f/The angular scale of fluc- 
tuations in the cosmic microwave 
background can be used to infer 
the curvature of the universe. 


8.2.11 Observation 


Historically, it was believed that the cosmological constant was 
zero, that nearly all matter in the universe was in the form of atoms, 
and that there was therefore only one interesting cosmological pa- 
rameter to measure, which was the average density of matter. This 
density was very difficult to determine, even to within an order of 
magnitude, because most of the matter in the universe probably 
doesn’t emit light, making it difficult to detect. Astronomical dis- 
tance scales were also very poorly calibrated against absolute units 
such as the SI. Starting around 1995, however, a new set of tech- 
niques led to an era of high-precision cosmology. 


Spatial curvature from CMB fluctuations 


A strong constraint on the models comes from accurate mea- 
surements of the cosmic microwave background, especially by the 
1989-1993 COBE probe, and its 2001-2009 successor, the Wilkinson 
Microwave Anisotropy Probe, positioned at the L2 Lagrange point 
of the earth-sun system, beyond the Earth on the line connecting 
sun and earth.?® The temperature of the cosmic microwave back- 
ground radiation is not the same in all directions, and it can be 
measured at different angles. In a universe with negative spatial 
curvature, the sum of the interior angles of a triangle is less than 
the Euclidean value of 180 degrees. Therefore if we observe a varia- 
tion in the CMB over some angle, the distance between two points 
on the surface of last scattering is actually greater than would have 
been inferred from Euclidean geometry. The distance scale of such 
variations is limited by the speed of sound in the early universe, so 
one can work backward and infer the universe’s spatial curvature 
based on the angular scale of the anisotropies. The measurements 
of spatial curvature are usually stated in terms of the parameter Q, 
defined as the total average density of all source terms in the Ein- 
stein field equations, divided by the critical density that results in 
a flat universe. Q includes contributions from matter, Qy,, the cos- 
mological constant, Q,, and radiation (negligible in the present-day 
unverse). The results from WMAP, combined with other data from 
other methods, gives Q = 1.005 +.006. In other words, the universe 
is very nearly spatially flat. 


Accelerating expansion from supernova data 


The supernova data described on page 341 complement the CMB 
data because they are mainly sensitive to the difference Qa — Quy, 
rather than their sum 2 = Qa 4+ Qy. This is because these data 
measure the acceleration or deceleration of the universe’s expansion. 
Matter produces deceleration, while the cosmological constant gives 


?8Komatsu et al., 2010, arxiv.org/abs/1001.4538 
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acceleration. Figure g shows some recent supernova data.2? The 
horizontal axis gives the redshift factor z = (A’ — \)/A, where X’ is 
the wavelength observed on earth and 2 the wavelength originally 
emitted. It measures how fast the supernova’s galaxy is receding 
from us. The vertical axis is A(m— M) = (m—M)—(m—M) empty; 
where m is the apparent magnitude, M is the absolute magnitude, 
and (m — M)empty is the value expected in a model of an empty 
universe, with Q = 0. The difference m— M is a measure of distance, 
so essentially this is a graph of distance versus recessional velocity, of 
the same general type used by Hubble in his original discovery of the 
expansion of the universe. Subtracting (m—M )empty on the vertical 
axis makes it easier to see small differences. Since the WMAP data 
require Q = 1, we need to fit the supernova data with values of Qj 
and Q, that add up to one. Attempting to do so with Qyy, = 1 and 
Q, = 0 is clearly inconsistent with the data, so we can conclude 
that the cosmological constant is definitely positive. 


Density of matter from baryonic acoustic oscillations 


Efforts such as the Sloan Digital Sky Survey have made three- 
dimensional maps of the density of luminous matter in the uni- 
verse.°? The distribution is clumpy. Measuring the average corre- 
lation € between the density at points separated by some distance 
s (measured in the comoving frame), one would expect that the 
function €(s) would be largest when s was small and would sim- 
ply taper off with increasing s. By analogy, we don’t usually find 
a Manhattan-style landscape of skyscrapers side by side with an 
uninhabited mountainous wilderness. On the other hand, imagine 
constructing such a correlation function for houses in a subdivision 
in which the roads do not form any regular grid, but zoning regu- 
lations prohibit construction of houses on lots of less than a certain 
size. In this situation, there would be zero probability of finding 
houses separated by very small distances, and €(s) would exhibit 
a peak at some larger scale set by the legal code. The actual re- 
sults of the sky surveys do show such a peak, which is due to well 
known physics referred to as baryon acoustic oscillations (BAO).?! 
In the early universe, any region of overdensity would tend to create 
a radiating sound wave like the bang of a firecracker. Such waves 
propagated at a known speed (about half the speed of light) for a 
known time (about 400,000 years, until matter became deionized 
and transparent to radiation, making it immune to the photon pres- 
sure that drove the oscillations). This leads to a known distance 
s, which forms a standard ruler at which the peak in €(s) occurs. 
In cosmological models, these results strongly constrain Qy,, while 
being relatively insensitive to Qa, and they are therefore comple- 


?9Riess et al., 2007, arxiv.org/abs/astro-ph/0611572. A larger data set is 
analyzed in Kowalski et al., 2008, arxiv.org/abs/0804. 4142. 

3°Sanchez et al., 2012, arxiv.org/abs/1203.6616 

3! Bassett and Hlozek, 2009, arxiv.org/abs/0910.5224 
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mentary to both the supernova data and the CMB results. 


Conclusions about cosmology 


Figure i summarizes what we can conclude about our universe, 
parametrized in terms of a model with both Qyy and Q, nonzero.?? 
We can tell that it originated in a Big Bang singularity, that it will 
go on expanding forever, and that it is very nearly flat. Note that 
in a cosmology with nonzero values for both Qa, and Qa, there is 
no strict linkage between the spatial curvature and the question of 
recollapse, as there is in a model with only matter and no cosmo- 
logical constant; therefore even though we know that the universe 
will not recollapse, we do not know whether its spatial curvature is 
slightly positive (closed) or negative (open). 


Consistency checks 


Astrophysical considerations provide further constraints and con- 
sistency checks. In the era before the advent of high-precision cos- 
mology, estimates of the age of the universe ranged from 10 billion 
to 20 billion years, and the low end was inconsistent with the age 
of the oldest globular clusters. This was believed to be a problem 
either for observational cosmology or for the astrophysical models 
used to estimate the age of the clusters: “You can’t be older than 
your ma.” Current data have shown that the low estimates of the 
age were incorrect, so consistency is restored. 


That only a small fraction of the universe’s matter was luminous 
had been suspected by astronomers such as Zwicky as early as 1933, 
based on the inability to reconcile the observed kinematics with 
Newton’s laws if all matter was assumed to be luminous. 


Dark matter 


Another constraint comes from models of nucleosynthesis dur- 
ing the era shortly after the Big Bang (before the formation of the 
first stars). The observed relative abundances of hydrogen, helium, 
and deuterium cannot be reconciled with the density of “dust” (i.e., 
nonrelativistic matter) inferred from the observational data. If the 
inferred mass density were entirely due to normal “baryonic” matter 
(ie., matter whose mass consisted mostly of protons and neutrons), 
then nuclear reactions in the dense early universe should have pro- 
ceeded relatively efficiently, leading to a much higher ratio of helium 
to hydrogen, and a much lower abundance of deuterium. The con- 
clusion is that most of the matter in the universe must be made of 
an unknown type of exotic non-baryonic matter, known generically 
as “dark matter.” 


The existence of nonbaryonic matter is also required in order to 
reconcile the observed density of galaxies with the observed strength 


°°See Carroll, “The Cosmological Constant,” http://www.livingreviews. 
org/1rr-2001-1 for a full mathematical treatment of such models. 
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of the CMB fluctuations, and in merging galaxy clusters it has been 
observed that the gravitational potential is offset from the radiating 
plasma. A 2012 review paper on dark matter is Roos, arxiv.org/ 
abs/1208 .3662. 


A number of experiments are under way to detect dark matter 
directly. As of 2013, the most sensitive experiment has given null 
results: arxiv. org/abs/1310.8214. 


At one time it was widely expected that dark matter would con- 
sist of the lightest supersymmetric particle, which might for example 
be the neutralino. However, results from the LHC seem to make it 
unlikely that our universe exhibits supersymmetry, assuming that 
the energy scale is the electroweak scale, which is the only scale that 
has strong motivation. It now appears more likely that dark matter 
consists of some other particle such as sterile neutrinos or axions. 


Current discrepancies 


Even with the inclusion of dark matter, there is a problem 
with the abundance of lithium-7 relative to hydrogen, which models 
greatly overpredict.?? 


As of 2019, there is also tension between the values of the Hub- 
ble constant found from distance-ladder techniques and analysis of 
the CMB and BAO. The former** give about 74.2 + 1.8, in units of 
km/s/Mpc, while the latter give about 67.5+0.5. This may simply 
be a case where people always underestimate their systematic errors, 
or it may be a sign of new physics causing the universe to accelerate 
its expansion more rapidly than predicted by ACDM models. Pro- 
posed solutions involve physical ingredients such as sterile neutrinos, 
axions, and phantom energy (example 26, p. 352). 


8.3 Mach’s principle revisited 
8.3.1 The Brans-Dicke theory 


Mach himself never succeeded in stating his ideas in the form of 
a precisely testable physical theory, and we’ve seen that to the ex- 
tent that Einstein’s hopes and intuition had been formed by Mach’s 
ideas, he often felt that his own theory of gravity came up short. 
The reader has so far encountered Mach’s principle in the context of 
certain thought experiments that are obviously impossible to realize, 
involving a hypothetical universe that is empty except for certain 
apparatus (e.g., section 3.6.2, p. 116). It would be easy, then, to get 
an impression of Mach’s principle as one of those theories that is 
“not even wrong,” i.e., so ill-defined that it cannot even be falsified 
by experiment, any more than Christianity can be. 


But in 1961, Robert Dicke and his student Carl Brans came up 


Sarxiv.org/abs/0808.2818, arxiv. org/abs/1107.1117 
https: //arxiv.org/abs/1903.07603 
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with a theory of gravity that made testable predictions, and that 
was specifically designed to be more Machian than general relativity. 
Their paper®’ is extremely readable, even for the non-specialist. 
In this theory, the seemingly foolproof operational definition of a 
Lorentz frame given on p. 26 fails. On the first page, Brans and 
Dicke propose one of those seemingly foolish thought experiments 
about a nearly empty universe: 


The imperfect expression of [Mach’s ideas] in general 
relativity can be seen by considering the case of a space 
empty except for a lone experimenter in his laboratory. 
[...| The observer would, according to general relativity, 
observe normal behavior of his apparatus in accordance 
with the usual laws of physics. However, also accord- 
ing to general relativity, the experimenter could set his 
laboratory rotating by leaning out a window and firing 
his 22-caliber rifle tangentially. Thereafter the delicate 
gyroscope in the laboratory would continue to point in a 
direction nearly fixed relative to the direction of motion 
of the rapidly receding bullet. The gyroscope would ro- 
tate relative to the walls of the laboratory. Thus, from 
the point of view of Mach, the tiny, almost massless, 
very distant bullet seems to be more important than the 
massive, nearby walls of the laboratory in determining 
inertial coordinate frames and the orientation of the gy- 
roscope. 


They then proceed to construct a mathematical and more Mach- 
ian theory of gravity. From the Machian point of view, the correct 
local definition of an inertial frame must be determined relative 
to the bulk of the matter in the universe. We want to retain the 
Lorentzian local character of spacetime, so this influence can’t be 
transmitted via instantaneous action at a distance. It must prop- 
agate via some physical field, at a speed less than or equal to c. 
It is implausible that this field would be the gravitational field as 
described by general relativity. Suppose we divide the cosmos up 
into a series of concentric spherical shells centered on our galaxy. 
In Newtonian mechanics, the gravitational field obeys Gauss’s law, 
so the field of such a shell vanishes identically on the interior. In 
relativity, the corresponding statement is Birkhoff’s theorem, which 
states that the Schwarzschild metric is the unique spherically sym- 
metric solution to the vacuum field equations. Given this solution 
in the exterior universe, we can set a boundary condition at the out- 
side surface of the shell, use the Einstein field equations to extend 
the solution through it, and find a unique solution on the interior, 
which is simply a flat space. 


35. Brans and R. H. Dicke, “Mach’s Principle and a Relativistic Theory of 
Gravitation,” Physical Review 124 (1961) 925 
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Since the Machian effect can’t be carried by the gravitational 
field, Brans and Dicke took up an idea earlier proposed by Pascual 
Jordan®® of hypothesizing an auxiliary field ¢. The fact that such a 
field has never been detected directly suggests that it has no mass or 
charge. If it is massless, it must propagate at exactly c, and this also 
makes sense because if it were to propagate at speeds less than c, 
there would be no obvious physical parameter that would determine 
that speed. How many tensor indices should it have? Since Mach’s 
principle tries to give an account of inertia, and inertial mass is a 
scalar,?’ ¢ should presumably be a scalar (quantized by a spin-zero 
particle). Theories of this type are called tensor-scalar theories, 
because they use a scalar field in addition to the metric tensor. 


The wave equation for a massless scalar field, in the absence of 
sources, is simply V;V‘¢ = 0. The solutions of this wave equa- 
tion fall off as ¢ ~ 1/r. This is gentler than the 1/r? variation of 
the gravitational field, so results like Newton’s shell theorem and 
Birkhoff’s theorem no longer apply. If a spherical shell of mass acts 
as a source of ¢, then ¢ can be nonzero and varying inside the shell. 
The ¢ that you experience right now as you read this book should be 
a sum of wavelets originating from all the masses whose world-lines 
intersected the surface of your past light-cone. In a static universe, 
this sum would diverge linearly, so a self-consistency requirement for 
Brans-Dicke gravity is that it should produce cosmological solutions 
that avoid such a divergence, e.g., ones that begin with Big Bangs. 


Masses are the sources of the field ¢. How should they couple to 
it? Since ¢ is a scalar, we need to construct a scalar as its source, 
and the only reasonable scalar that can play this role is the trace of 
the stress-energy tensor, T’,. As discussed in example 11 on page 
319, this vanishes for light, so the only sources of ¢ are material 
particles.2> Even so, the Brans-Dicke theory retains a form of the 
equivalence principle. As discussed on pp. 39 and 33, the equiva- 
lence principle is a statement about the results of local experiments, 
and ¢ at any given location in the universe is dominated by con- 
tributions from matter lying at cosmological distances. Objects of 
different composition will have differing fractions of their mass that 
arise from internal electromagnetic fields. Two such objects will still 
follow identical geodesics, since their own effect on the local value 


36 Jordan was a member of the Nazi Sturmabteilung or “brown shirts” who 
nevertheless ran afoul of the Nazis for his close professional relationships with 
Jews. 

37 A limit of 5 x 10773 has been placed on the anisotropy of the inertial mass 
of the proton: R.W.P. Drever, “A search for anisotropy of inertial mass using a 
free precession technique,” Philosophical Magazine, 6:687 (1961) 683. 

38This leads to an exception to the statement above that all Brans-Dicke 
spacetimes are expected to look like Big Bang cosmologies. Any solution of the 
GR field equations containing nothing but vacuum and electromagnetic fields 
(known as an “elevtrovac” solution) is also a valid Brans-Dicke spacetime. In 
such a spacetime, a constant ¢ can be set arbitrarily. Such a spacetime is in 
some sense not generic for Brans-Dicke gravity. 
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of ¢ is negligible. This is unlike the behavior of electrically charged 
objects, which experience significant back-reaction effects in curved 
space (p. 39). However, the strongest form of the equivalence princi- 
ple requires that all experiments in free-falling laboratories produce 
identical results, no matter where and when they are carried out. 
Brans-Dicke gravity violates this, because such experiments could 
detect differences between the value of ¢ at different locations — 
but of course this is part and parcel of the purpose of the theory. 


We now need to see how to connect ¢ to the local notion of 
inertia so as to produce an effect of the kind that would tend to 
fulfill Mach’s principle. In Mach’s original formulation, this would 
entail some kind of local rescaling of all inertial masses, but Brans 
and Dicke point out that in a theory of gravity, this is equivalent to 
scaling the Newtonian gravitational constant G down by the same 
factor. The latter turns out to be a better approach. For one thing, 
it has a natural interpretation in terms of units. Since ¢’s amplitude 
falls off as 1/r, we can write ¢ ~ Um;/r, where the sum is over the 
past light cone. If we then make the identification of ¢ with 1/G 
(or c?/G in a system wher c # 1), the units work out properly, and 
the coupling constant between matter and ¢ can be unitless. If this 
coupling constant, notated 1/w, were not unitless, then the theory’s 
predictive value would be weakened, because there would be no way 
to know what value to pick for it. For a unitless constant, however, 
there is a reasonable way to guess what it should be: “in any sensible 
theory,” Brans and Dicke write, “w must be of the general order of 
magnitude of unity.” This is, of course, assuming that the Brans- 
Dicke theory was correct. In general, there are other reasonable 
values to pick for a unitless number, including zero and infinity. The 
limit of w — oo recovers the special case of general relativity. Thus 
Mach’s principle, which once seemed too vague to be empirically 
falsifiable, comes down to measuring a specific number, w, which 
quantifies how non-Machian our universe is.?9 


3° Another good technical reasons for thinking of ¢ as relating to the gravita- 
tional constant is that general relativity has a standard prescription for describ- 
ing fields on a background of curved spacetime. The vacuum field equations of 
general relativity can be derived from the principle of least action, and although 
the details are beyond the scope of this book (see, e.g., Wald, General Relativ- 
ity, appendix E), the general idea is that we define a Lagrangian density Loe 
that depends on the Ricci scalar curvature, and then extremize its integral over 
all possible histories of the evolution of the gravitational field. If we want to 
describe some other field, such as matter, light, or ¢, we simply take the special- 
relativistic Lagrangian £yy for that field, change all the derivatives to covariant 
derivatives, and form the sum (1/G)£L¢ + Ly. In the Brans-Dicke theory, we 
have three pieces, (1/G)£Lg + Lu + £4, where Ly is for matter and L¢ for ¢. 
If we were to interpret ¢ as a rescaling of inertia, then we would have to have ¢ 
appearing as a fudge factor modifying all the inner workings of £y,. If, on the 
other hand, we think of ¢ as changing the value of the gravitational constant G, 
then the necessary modification is extremely simple. Brans and Dicke introduce 
one further modification to £g so that the coupling constant w between matter 
and ¢ can be unitless. This modification has no effect on the wave equation of 
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8.3.2 Predictions of the Brans-Dicke theory 


Returning to the example of the spherical shell of mass, we can 
see based on considerations of units that the value of ¢ inside should 
be ~ m/r, where m is the total mass of the shell and r is its radius. 
There may be a unitless factor out in front, which will depend on w, 
but for w ~ 1 we expect this constant to be of order 1. Solving the 
nasty set of field equations that result from their Lagrangian, Brans 
and Dicke indeed found ¢ % [2/(3 + 2w)|(m/r), where the constant 
in square brackets is of order unity if w is of order unity. In the 
limit of w + oo, ¢ = O, and the shell has no physical effect on its 
interior, as predicted by general relativity. 


Brans and Dicke were also able to calculate cosmological models, 
and in a typical model with a nearly spatially flat universe, they 
found ¢ would vary according to 


ii a 4 \ 2/(443w) 
=8 t(— 
@ not ot (+) ’ 


where po is the density of matter in the universe at time t = to. 
When the density of matter is small, G is large, which has the same 
observational consequences as the disappearance of inertia; this is 
exactly what one expects according to Mach’s principle. For w — oo, 
the gravitational “constant” G = 1/¢ really is constant. 


Returning to the thought experiment involving the 22-caliber ri- 
fle fired out the window, we find that in this imaginary universe, with 
a very small density of matter, G should be very large. This causes 
a frame-dragging effect from the laboratory on the gyroscope, one 
much stronger than we would see in our universe. Brans and Dicke 
calculated this effect for a laboratory consisting of a spherical shell, 
and although technical difficulties prevented the reliable extrapo- 
lation of their result to p, — 0, the trend was that as p, became 
small, the frame-dragging effect would get stronger and stronger, 
presumably eventually forcing the gyroscope to precess in lock-step 
with the laboratory. There would thus be no way to determine, once 
the bullet was far away, that the laboratory was rotating at all — 
in perfect agreement with Mach’s principle. 


8.3.3 Hints of empirical support 


Only six years after the publication of the Brans-Dicke theory, 
Dicke himself, along with H.M. Goldenberg”? carried out a measure- 
ment that seemed to support the theory empirically. Fifty years 
before, one of the first empirical tests of general relativity, which it 
had seemed to pass with flying colors, was the anomalous perihelion 
precession of Mercury. The word “anomalous,” which is often left 
out in descriptions of this test, is required because there are many 


¢ in flat spacetime. 
“°Dicke and Goldenberg, “Solar Oblateness and General Relativity,” Physical 
Review Letters 18 (1967) 313 
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nonrelativistic reasons why Mercury’s orbit precesses, including in- 
teractions with the other planets and the sun’s oblate shape. It is 
only when these other effects are subtracted out that one sees the 
general-relativistic effect calculated on page 228. The sun’s oblate- 
ness is difficult to measure optically, so the original analysis of the 
data had proceeded by determining the sun’s rotational period by 
observing sunspots, and then assuming that the sun’s bulge was the 
one found for a rotating fluid in static equilibrium. The result was an 
assumed oblateness of about 1 x 107°. But we know that the sun’s 
dynamics are more complicated than this, since it has convection 
currents and magnetic fields. Dicke, who was already a renowned 
experimentalist, set out to determine the oblateness by direct op- 
tical measurements, and the result was (5.0 + 0.7) x 107°, which, 
although still very small, was enough to put the observed perihelion 
precession out of agreement with general relativity by about 8%. 
The perihelion precession predicted by Brans-Dicke gravity differs 
from the general relativistic result by a factor of (4 + 3w)/(6 + 3w). 
The data therefore appeared to require w + 6+ 1, which would be 
inconsistent with general relativity. 


8.3.4 Mach’s principle is false. 


The trouble with the solar oblateness measurements was that 
they were subject to a large number of possible systematic errors, 
and for this reason it was desirable to find a more reliable test of 
Brans-Dicke gravity. Not until about 1990 did a consensus arise, 
based on measurements of oscillations of the solar surface, that the 
pre-Dicke value was correct. In the interim, the confusion had the 
salutary effect of stimulating a renaissance of theoretical and ex- 
perimental work in general relativity. Often if one doesn’t have an 
alternative theory, one has no reasonable basis on which to design 
and interpret experiments to test the original theory. 


Currently, the best bound on w is based on measurements*! 
of the propagation of radio signals between earth and the Cassini- 
Huygens space probe in 2003, which require w > 4 x 104. This is 
so much greater than unity that it is reasonable to take Brans and 
Dicke at their word that “in any sensible theory, w must be of the 
general order of magnitude of unity.” Brans-Dicke fails this test, and 
is no longer a “sensible” candidate for a theory of gravity. We can 
now see that Mach’s principle, far from being a fuzzy piece of philo- 
sophical navel-gazing, is a testable hypothesis. It has been tested 
and found to be false, in the following sense. Brans-Dicke gravity 
is about as natural a formal implementation of Mach’s principle as 
could be hoped for, and it gives us a number w that parametrizes 
how Machian the universe is. The empirical value of w is so large 
that it shows our universe to be essentially as non-Machian as gen- 


“1Bertotti, Iess, and Tortora, “A test of general relativity using radio links 
with the Cassini spacecraft,” Nature 425 (2003) 374 
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8.4 Historical note: the steady-state model 


From 1948 until around the mid-1960s, the Big Bang theory had 
viable competition in the form of the steady-state model, originated 
by the British trio of Fred Hoyle, Hermann Bondi, and Thomas 
Gold. Legend has it that they came up with the idea after seeing a 
horror movie called Dead of Night, in which events from the begin- 
ning of the story repeat themselves later. This led them to imagine 
that the universe could, although expanding, remain locally in the 
same state at all times. If this were to happen, the empty space 
being opened up between the galaxies would have to be filled back 
in by the spontaneous creation of matter. The model holds a strong 
philosophical appeal because it generalizes the Copernican principle 
so that it applies not just to conditions everywhere in space but also 
at all times. 


They published the idea in a pair of back-to-back papers, one 
by Bondi and Gold*? and one by Hoyle,4? with comments appended 
to the former on the differences between the two approaches. The 
Bondi-Gold paper is especially fun to read, because it is written in 
nontechnical language and shows a type of daring and creative sci- 
ence that is not often encountered today. Much of it reads like a 
catalog of cherished principles of physics that were to be given up, in- 
cluding Lorentz invariance, general relativity, the equivalence princi- 
ple, and possibly the laws of conservation of charge and mass-energy. 
The following is a brief presentation (in slightly different notation) 
of the Hoyle’s more mathematically detailed ideas, as sketched in 
his original paper. Although Hoyle eventually fleshed out the ideas 
more thoroughly, by the time he had done so the steady-state the- 
ory was already on its way to being crushed under the weight of 
contrary observations. 


Since the model is always to be in the same state, the quan- 
tity a/a must always be the same, i.e., the Hubble constant really 
is a constant over time. This requires exponential growth, which 
means that the geometry is that of de Sitter space. In any model 
that assigns an infinite age to the universe, one must explain why 
the universe has not undergone heat death due to the second law 
of thermodynamics. The steady-state model successfully addresses 
this problem, because the exponential expansion is rapid enough to 
prevent thermal equilibrium from happening. 


Hoyle sets out to preserve local conservation of energy-momentum, 


«The Steady-State Theory of the Expanding Universe,” MNRAS 108 (1948) 
252; adsabs.harvard.edu/cgi-bin/nph-bib_query?bibcode=1948MNRAS. 108. 
.252B 

434 New Model for the Expanding Universe,” MNRAS 108 (1948) 372, ui. 
adsabs. harvard. edu/abs/1948MNRAS.108..372H/abstract 
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without which the Einstein field equations become inconsistent. (This 
was Hoyle’s more conservative approach. Bondi and Gold advocated 
replacing general relativity completely rather than modifying it.) He 
postulates a massless, chargeless, scalar field C’, called the “C field,” 
the letter “C” standing for “creation.” Suppose that the C field’s 
contribution to the stress-energy tensor ends up being that of a per- 
fect fluid with the same rest frame as the ordinary matter. The rate 
of creation of mass-energy is then given by the divergence V,,T}', 
and we need this to be zero. As shown in example 21 on p. 345, this 
requires that our total stress-energy mimic that of a cosmological 
constant, with p+ P= 0. Since the ordinary matter has p > 0 and 
P > 0, the C field will either need to contribute negative energy 
density or negative pressure. We’ll see below that Hoyle’s model is 
constructed so that the C field has zero energy and negative pres- 
sure. One will often see the C field described incorrectly as having 
negative energy to cancel out the positive energy of the matter being 
created. That wouldn’t have worked, because then the total energy 
density p would always be zero, which is not what we observe. (For 
example, the Friedmann equation for G/a relates p to the square of 
the Hubble constant.) 


To understand more about why the theory took the form it did, 
it is helpful to look at some general physical considerations about 
symmetry. As Bondi and Gold admit candidly, any theory of this 
type is likely to violate Lorentz invariance. We can observe an evac- 
uated box and wait for hydrogen atoms to appear. When they ap- 
pear, they’re in some state of motion, at least on the average. This 
state of motion defines a preferred frame. In addition to breaking 
symmetry under Lorentz transformations, the theory lacks time- 
reversal invariance (because matter appears but never disappears) 
and charge-conjugation symmetry (because matter appears but an- 
timatter doesn’t). All of these asymmetries arise because in this 
approach, we try to explain the observed asymmetries of the cosmo- 
logical state of the universe as arising directly from asymmetries in 
the underlying local laws of physics. Such an approach is very dif- 
ferent from the modern one, in which we expect the asymmetries to 
arise from either boundary conditions or instabilities (spontaneous 
symmetry breaking). 


Because the C field is massless and chargeless, we would normally 
expect it to obey the wave equation VaV°C' = 0. Hoyle’s field does 
not, however, evolve according to any Lorentz-invariant dynamical 
law. Instead it simply evolves as C = t, where t is a preferred time 
coordinate. In any cosmological model in which the matter fields 
are modeled as perfect fluids, we have a preferred time coordinate 
which is the proper time of an observer at rest with respect to the 
fluid, and in the Hoyle model we do assume that this is the time t¢ 
we should use in defining C’. However, Hoyle’s theory is different 
because it gives this preferred time a role in the local laws of physics, 
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thereby breaking Lorentz invariance. 


The value of the scalar field C cannot have any directly ob- 
servable effects, since then its time-evolution would distinguish one 
epoch of the universe from another. Instead we form the gradient 
VaC. This gives a vector, which can be interpreted as a velocity 
vector defining the preferred frame of reference. An observer in this 
frame is at rest relative to the local cosmological fluid, observes the 
universe to be homogeneous, and also observes that when new atoms 
are created from the vacuum, they are on the average at rest. Thus 
VaC is observable. 


The contribution of the C field to the stress-energy must be a 
rank-two tensor, and if we want to construct such a tensor, the only 
good possibility that occurs to me“ is kVaVpC, with k a positive 
constant. If the derivatives had been ordinary partial derivatives, 
the second derivative would have vanished because C' is linear in 
time, but the covariant derivatives do not vanish, and in fact the 
second derivative is a tensor measuring the rate of cosmological ex- 
pansion; the trace V°V,C is the volume expansion © defined on 
p. 317. For de Sitter space, O has a constant value equal to three 
times the Hubble constant Ho. We can now see why we could not 
take the C field to evolve according to the usual wave equation 
VaV°C = 0; if it did, then we would have © = 0, and the universe 
would not be expanding. 


When we evaluate the second derivative for the de Sitter metric, 
the only nonvanishing Christoffel symbols that occur are I%,. = 
I‘yy =I. = aa. We find Tf = 0 and T? = Ty = T? = kO/3. 
Thus the C field’s mass-energy density is zero, while for its pressure 
we have P = —kQ/3, which is negative. 


For simplicity, we take the ordinary matter to be dust. The total 
stress-energy then consists of an energy density p that is due only to 
the dust, and a negative pressure P that comes only from the C field. 
If we require both of these to be constant, take the cosmological 
constant to be zero, set a = e/0t = €%/3, and substitute into the 
Friedmann equations on p. 329, we find P = —p, or k = 3p/0 = 
p/ Ho. 


Like the cosmological constant, the C field is taken to be a 
universally prescribed property of the vacuum. There is a differ- 
ence, however, because the cosmological constant’s contribution to 
the stress-energy is proportional to the metric, which preserves the 
equivalence principle. As remarked at the end of the Bondi-Gold 
paper, the C field violates the equivalence principle. No calcula- 
tion is spelled out, but they say based on a personal communication 
from Hoyle that the field exerts a force on matter which produces a 


“4The only other obvious possibility would have been something like 
—kVaCV1C. This would be the stress-energy of a negative-mass dust, which 
would be unacceptable for the reasons discussed earlier. 
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significant acceleration in an atom, but a negligible one in a star. 


A claimed selling point of the C field was that it would prevent 
the formation of singularities, including both a Big Bang singularity 
and black holes. This is reasonable, since the C field violates all of 
the energy conditions listed on p. 308 except for the trace energy 
condition. The Penrose singularity theorem depends on the null en- 
ergy condition, and the Hawking singularity theorem requires either 
the strong or the null energy condition. 


Because we can’t make the C field obey the proper wave equation 
for a massless, spin-zero particle, there is no obvious way to make 
up a dynamical law for its evolution in order to replace the fixed 
relation C = t, and we do not expect to have any classical field 
theory for the C field. Hoyle did attempt to add dynamics to the 
model by making it into what is known as a “direct field,” which was 
a type of action-at-a-distance theory that in the 1960s was believed 
to be a good candidate for the fundamental description of the forces 
of physics. (Quantum field theory had not developed to the point 
where it could handle the strong or weak nuclear forces.) Such 
theories were shown to be nonviable as quantum theories in 1963 by 
Currie, Jordan, and Sudarshan. 


The steady-state model began to succumb to contrary evidence 
when Ryle and coworkers counted radio sources and found that they 
did not show the statistical behavior predicted by the model. The 
coup de grace came with the discovery of the cosmic microwave back- 
ground, which demonstrated directly that the universe had once 
been much hotter than it is now. Attempts have been made to 
produce variations on the model that are consistent with these ob- 
servations, but they have not succeeded; for a detailed discussion 
see http://www.astro.ucla.edu/~wright/stdystat.htm. 
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Problems 


1 Verify, as claimed on p. 299, that the electromagnetic pressure 
inside a medium-weight atomic nucleus is on the order of 10°? Pa. 


2 Is the Big Bang singularity removable by the coordinate 
transformation t > 1/t? > Solution, p. 399 


3 Verify the claim made on p. 332 that a is a linear function of 
time in the case of the Milne universe, and that k = —1. 


4 Examples 16 on page 337 and 18 on page 340 discussed ropes 
with cosmological lengths. Reexamine these examples in the case of 
the Milne universe. > Solution, p. 399 


5 (a) Show that the Friedmann equations are symmetric under 
time reversal. (b) The spontaneous breaking of this symmetry in 
perpetually expanding solutions was discussed on page 348. Use 
the definition of a manifold to show that this symmetry cannot be 
restored by gluing together an expanding solution and a contracting 
one “back to back” to create a single solution on a single, connected 
manifold. > Solution, p. 399 


6 The Einstein field equations are 
Gab = 8nT ab + AGab, 


and when it is possible to adopt a frame of reference in which the 
local mass-energy is at rest on average, we can interpret the stress- 
energy tensor as 


ibe = diag(—p, ape ge z 


where p is the mass-energy density and P is the pressure. Fix some 
point as the origin of a local Lorentzian coordinate system. Analyze 
the properties of these relations under a reflection such as « > —2x 
or t > —t. > Solution, p. 399 


7 (a) Show that a positive cosmological constant violates the 
strong energy condition in a vacuum. In applying the definition of 
the strong energy condition, treat the cosmological constant as a 
form of matter, i.e., “roll in” the cosmological constant term to the 
stress-energy term in the field equations. (b) Comment on how this 
affects the results of the following paper: Hawking and Ellis, “The 
Cosmic Black-Body Radiation and the Existence of Singularities in 
Our Universe,” Astrophysical Journal, 152 (1968) 25, 
http://articles.adsabs.harvard.edu/f...pJ...152...25H. 
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8 In problem 7 on page 209, we analyzed the properties of the 
metric 
ds? = e797 dé? — dz”. 


(a) In that problem we found that this metric had the same proper- 
ties at all points in space. Verify in particular that it has the same 
scalar curvature R at all points in space. 

(b) Show that this is a vacuum solution in the two-dimensional (t, z) 
space. 

(c) Suppose we try to generalize this metric to four dimensions as 


ds? = e797 di? — dx* — dy? — dz". 


Show that this requires an Einstein tensor with unphysical proper- 
ties. 
> Solution, p. 400 


9 Consider the following proposal for defeating relativity’s prohi- 
bition on velocities greater than c. Suppose we make a chain billions 
of light-years long and attach one end of the chain to a particular 
galaxy. At its other end, the chain is free, and it sweeps past the 
local galaxies at a very high speed. This speed is proportional to 
the length of the chain, so by making the chain long enough, we can 
make the speed exceed c. 


Debunk this proposal in the special case of the Milne universe. 


10 Make a rigorous definition of the volume V of the observ- 
able universe. Suppose someone asks whether V depends on the 
observer’s state of motion. Does this question have a well-defined 
answer? If so, what is it? Can we calculate V’s observer-dependence 
by applying a Lorentz contraction? > Solution, p. 401 


11 For a perfect fluid, we have P = wp, where w is a constant. 
The cases w = 0 and w = 1/3 correspond, respectively, to dust and 
radiation. Show that for a flat universe with A = 0 dominated by a 
single component that is a perfect fluid, the solution to the Fried- 
mann equations is of the form a « t®, and determine the exponent 
6. Check your result in the dust case against the one on p. 347, 
then find the exponent in the radiation case. Although the w = —1 
case corresponds to a cosmological constant, show that the solution 
is not of this form for w = —1. > Solution, p. 401 


12 Apply the result of problem 11 to generalize the result 

of example 22 on p. 348 for the size of the observable universe. 

What is the result in the case of the radiation-dominated universe? 
> Solution, p. 402 


13 The Kantowski-Sachs metric is 
ds” = dt? — A! (d6? + sin? 6. dd”) — exp(2VAt) dz’. 


It describes a universe with the spatial topology of a 3-cylinder. Use 
a computer algebra system such as Maxima to verify the following 
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facts. 

(a) Any world-line of the form (t, 6, ¢, z) = (A, constants) is a geodesic 
parametrized by proper time. (If using Maxima, you will find that 
the function cgeodesic() saves time here.) 

(b) If two such geodesics are separated only in the z direction, the 
distance between them along a surface of fixed t increases exponen- 
tially with t, while geodesics separated only in 6 and ¢ do not recede 
from one another. 

(c) There are no matter fields, only a cosmological constant A. 

(d) The Ricci scalar R = —4A (+ — —— signature) equals 1/3 of 
the value for the de Sitter vacuum-dominated cosmology (sec. 8.2.7, 
p. 341), the factor of 3 occurring because there is expansion along 
only one axis rather than three. 

(e) The vacuum-dominated cosmology found by de Sitter and pre- 
sented in the text was supposed to be the unique cosmological so- 
lution of this type. Why is the Kantowski-Sachs metric not a coun- 
terexample? > Solution, p. 402 
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Chapter 9 
Gravitational Waves 


9.1 The speed of gravity 


In Newtonian gravity, gravitational effects are assumed to propagate 
at infinite speed, so that for example the lunar tides correspond at 
any time to the position of the moon at the same instant. This 
clearly can’t be true in relativity, since simultaneity isn’t something 
that different observers even agree on. Not only should the “speed of 
gravity” be finite, but it seems implausible that it would be greater 
than c; in section 2.2 (p. 51), we argued based on empirically well 
established principles that there must be a maximum speed of cause 
and effect. Although the argument was only applicable to special 
relativity, i.e., to a flat spacetime, it seems likely to apply to general 
relativity as well, at least for low-amplitude waves on a flat back- 
ground. As early as 1913, before Einstein had even developed the 
full theory of general relativity, he had carried out calculations in the 
weak-field limit showing that gravitational effects should propagate 
at c. We will work out an argument to this effect (using a different 
technique than Einstein’s) in section 9.2.3. This seems eminently 
reasonable, since (a) it is likely to be consistent with causality, and 
(b) G and ¢ are the only constants with units that appear in the 
field equations (obscured by our choice of units, in which G = 1 and 
c = 1), and the only velocity-scale that can be constructed from 
these two constants is c itself.! 


As shown by the following timeline, Einstein’s prediction was 
surprisingly difficult to verify. 


1913 Einstein predicts gravitational waves traveling 
at c. 
1982 Hulse-Taylor pulsar (pp. 232, 372) seen to lose 


energy at the rate predicted by general relativ- 
ity’s prediction of gravitational radiation. 

2016-2017 Direct detection of gravitational waves and ver- 
ification that they propagate at c. 


Why did this process take over a century? Naive arguments 
suggest that it should have been much easier. Workers as early as 


'High-amplitude waves need not propagate at c. For example, general rela- 
tivity predicts that a gravitational-wave pulse propagating on a background of 
curved spacetime develops a trailing edge that propagates at less than c (Misner, 
Thorne, and Wheeler, p. 957). This effect is weak when the amplitude is small 
or the wavelength is short compared to the scale of the background curvature. 
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Newton and Laplace had investigated the consequences of a grav- 
itational force that propagated at some finite speed. It is easy to 
show that, if nonrelativistic ideas about spacetime are retained, the 
predicted results are dramatic and not consistent with observation. 
For example, the earth and moon orbit about their common center 
of mass, which is inside the earth but offset from the earth’s center. 
Suppose that we retain Newton’s ideas about spacetime, but modify 
Newton’s law of gravity to incorporate a time delay, with changes 
in the gravitational field propagating at some speed u. The force 
acting on the moon would then point toward the earth’s location at 
a slightly earlier time, and this force would therefore have a com- 
ponent parallel to the moon’s direction of motion. The force would 
do positive work on the moon and also exert a positive torque, the 
result being that the moon would spiral away. This is not consistent 
with the fact that the earth-moon system has remained fairly stable 
for billions of years, unless we take u to be very large. From the sta- 
bility of orbits in the solar system, Laplace estimated u > 10!° m/s, 
many orders of magnitude greater than c. This seemed to sup- 
port the Newtonian picture, in which gravity acts instantaneously 
at a distance. A time delay in Newtonian spacetime would also have 
been easily detected by twentieth-century measurements using space 
probes and radio astronomy.” 


The trouble with such arguments is that when we substitute rel- 
ativistic spacetime for Newtonian spacetime, it is no longer expected 
that a time-delayed field will point toward the retarded position of 
the source. For example, if an electric charge moves inertially, and 
is observed in a frame in which it is moving, then Lorentz invari- 
ance requires that its electric field lines be straight, and converge on 
the charge’s present position in that frame.? The speed of gravity 
therefore turns out to be much harder to measure than Laplace had 
believed. 


9.2 Gravitational radiation 


9.2.1 Empirical evidence 


The first strong empirical evidence of gravitational waves came 
in 1982. The Hulse-Taylor system (page 232) contains two neutron 
stars orbiting around their common center of mass, and the period 
of the orbit is observed to be decreasing gradually over time (figure 
a). This is interpreted as evidence that the stars are losing energy 
to radiation of gravitational waves.* As we'll see in section 9.2.5, 


?For an example of an erroneous 2003 claim to have performed such a test, 
see Fomalont and Kopeikin, http: //arxiv.org/abs/astro-ph/0302294. Their 
claims were debunked by Samuel, http://arxiv.org/abs/astro-ph/0304006, 
and Will, http://arxiv.org/abs/astro-ph/0301145. 

3Crowell, Special Relativity, section 10.4 

‘Stairs, “Testing General Relativity with Pulsar Timing,” http:// 
relativity.livingreviews.org/Articles/lrr-2003-5/ 
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the rate of energy loss is in excellent agreement with the predictions 
of general relativity. 
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An even more dramatic, if less clearcut, piece of evidence is 
Komossa, Zhou, and Lu’s observation? of a supermassive black hole 
that appears to be recoiling from its parent galaxy at a velocity of 
2650 km/s (projected along the line of sight). They interpret this as 
evidence for the following scenario. In the early universe, galaxies 
form with supermassive black holes at their centers. When two such 
galaxies collide, the black holes can merge. The merger is a violent 
process in which intense gravitational waves are emitted, and these 
waves carry a large amount of momentum, causing the black holes 
to recoil at a velocity greater than the escape velocity of the merged 
galaxy. 


Although the energy loss from systems such as the Hulse-Taylor 


“http: //arxiv. org/abs/0804.4585 


a/The Hulse-Taylor pulsar’s or- 
bital motion is gradually losing 
energy due to the emission of 
gravitational waves. The linear 
decrease of the period is inte- 
grated on this plot, resulting in 
a parabola. From Weisberg and 
Taylor, http://arxiv.org/abs/ 
astro-ph/0211217. 
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b / The gravitational waveform ob- 
served in 2016 by Advanced 
Ligo. 


binary provide strong evidence that gravitational waves exist and 
carry energy, physicists and astronomers still wanted to detect them 
directly, and serious attempts to design and build such systems be- 
gan around 1962. The design that finally achieved success used inter- 
ferometers which detect oscillations in the lengths of their own arms. 
The first gravitational-wave event was detected, by this method, in 
2016, by the Advanced LIGO collaboration.®© The event is believed 
to have been the result of the collision of two black holes. 


% 
>) 
- 
= 
© 
_ 
~ 
Vv) 


0.35 0.40 
Time (sec) 


In 2017, an event interpreted as the collision of two neutron stars 
was detected by both gravitational and electromagnetic radiation, 
verifying to high precision that gravitational waves propagate at c. 


Although the earlier 2016 collision of black holes did not directly 
compare the propagation of light and gravity, it provided a different 
kind of check on the propagation of gravitational waves at c. The 
waveform detected in this event was a “chirp” that glided up in 
frequency as the black holes spiraled toward one another and sped 
up. Since the wave was in transit for over a billion years, and the 
waveform lasted a fraction of a second, it follows that gravitational 
waves within this frequency range all travel at very nearly the same 
velocity, i.e., there is a very tight upper limit on the dispersion of 
gravitational waves. 


A complementary space-based system, LISA, has been proposed 
for launch in 2020, but its funding is uncertain. The two devices 
would operate in complementary frequency ranges (figure c). A 
selling point of LISA is that if it is launched, there are a number 
of sources in the sky, with known properties, that are known to be 
easily within its range of sensitivity.” One excellent candidate is HM 


Snttps://dcec.ligo.org/LIGO-P150914/public 
"G. Nelemans, “The Galactic Gravitational wave foreground,” arxiv.org/ 
abs/0901.1778v1 
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Cancri, a pair of white dwarfs with an orbital period of 5.4 minutes, 
shorter than that of any other known binary star.® 
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9.2.2 Energy content 


Even without performing the calculations for a system like the 
Hulse-Taylor binary, it is easy to show that if such waves exist, they 
must be capable of carrying away energy. Consider two equal masses 
in highly elliptical orbits about their common center of mass, figure 
d. The motion is nearly one-dimensional. As the masses recede 
from one another, they feel a delayed version of the gravitational 
force originating from a time when they were closer together and the 
force was stronger. The result is that in the near-Newtonian limit, 
they lose more kinetic and gravitational energy than they would have 
lost in the purely Newtonian theory. Now they come back inward 
in their orbits. As they approach one another, the time-delayed 
force is anomalously weak, so they gain less mechanical energy than 
expected. The result is that with each cycle, mechanical energy is 
lost. We expect that this energy is carried by the waves, in the 
same way that radio waves carry the energy lost by a transmitting 
antenna.? 


®Roelofs et al., “Spectroscopic Evidence for a 5.4-Minute Orbital Period in 
HM Cancri,” arxiv.org/abs/1003.0658v1 

°One has to be careful with this type of argument. In particular, one can 
obtain incorrect correct results by attempting to generalize this one-dimensional 
argument to motion in more than one dimension, because the effective semi- 
Newtonian interaction is not just a time-delayed version of Newton’s law; it also 
includes velocity-dependent forces. It is easy to see why such velocity-dependence 
must occur in the simpler case of electromagnetism. Suppose that charges A 
and B are not at rest relative to one another. In B’s frame, the electric field 
from A must come from the direction of the position that an observer comoving 
with B would extrapolate linearly from A’s last known position and velocity, 


c/ Predicted sensitivities of LISA 
and LIGO to gravitational waves 


of various frequences. 
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d/As the two planets recede 
from one another, each feels the 
gravitational attraction that the 
other one exerted in its previous 
position, delayed by the time 
it takes gravitational effects to 
propagate at c. At time f¢, the 
right-hand planet experiences 
the stronger deceleration corre- 
sponding to the left-hand planet’s 
closer position at the earlier 
time ¢’, not its current position 
at t. Mechanical energy is not 
conserved, and the orbits will 
decay. 


e/The sticky bead argument 
for the reality of gravitational 
waves. As a gravitational wave 
with the appropriate polarization 
passes by, the bead vibrates back 
and forth on the rod. Friction cre- 
ates heat. This demonstrates that 
gravitational waves carry energy, 
and are thus real, observable 
phenomena. 


Not only can these waves remove mechanical energy from a sys- 
tem, they can also deposit energy in a detector, as shown by the 
nonmathematical “sticky bead argument” (figure e), which was orig- 
inated by Feynman in 1957 and later popularized by Bondi. 


Now strictly speaking, we have only shown that gravitational 
waves can extract or donate mechanical energy, but not that the 
waves themselves transmit this energy. The distinction isn’t one 
that normally occurs to us, since we are trained to believe that 
energy is always conserved. But we know that, for fundamental rea- 
sons, general relativity doesn’t have global conservation laws that 
apply to all spacetimes (p. 148). Perhaps the energy lost by the 
Hulse-Taylor system is simply gone, never to reappear, and the en- 
ergy imparted to the sticky bead is simply generated out of nowhere. 
On the other hand, general relativity does have global conservation 
laws for certain specific classes of spacetimes, including, for example, 
a conserved scalar mass-energy in the case of a stationary spacetime 
(p. 266). Spacetimes containing gravitational waves are not sta- 
tionary, but perhaps there is something similar we can do in some 
appropriate special case. 


Suppose we want an expression for the energy of a gravita- 
tional wave in terms of its amplitude. This seems like it ought 
to be straightforward. We have such expressions in other classi- 
cal field theories. In electromagnetism, we have energy densities 
+(1/8rk)\|E|? and +(1/2p.)|B|? associated with the electric and 
magnetic fields. In Newtonian gravity, we can assign an energy 
density —(1/87G)|g|? to the gravitational field g; the minus sign 
indicates that when masses glom onto each other, they produce a 
greater field, and energy is released. 


In general relativity, however, the equivalence principle tells us 
that for any gravitational field measured by one observer, we can 
find another observer, one who is free-falling, who says that the local 
field is zero. It follows that we cannot associate an energy with the 
curvature of a particular region of spacetime in any exact way. The 
best we can do is to find expressions that give the energy density (1) 
in the limit of weak fields, and (2) when averaged over a region of 
space that is large compared to the wavelength. These expressions 
are not unique. There are a number of ways to write them in terms 
of the metric and its derivatives, and they all give the same result 
in the appropriate limit. The reader who is interested in seeing the 
subject developed in detail is referred to Carroll’s Lecture Notes on 
General Relativity, http: //arxiv.org/abs/gr-qc/?9712019. Al- 
though this sort of thing is technically messy, we can accomplish 
quite a bit simply by knowing that such results do exist, and that 
although they are non-unique in general, they are uniquely well de- 


as determined by light-speed calculation. This follows from Lorentz invariance, 
since this is the direction that will be seen by an observer comoving with A. A 
full discussion is given by Carlip, arxiv.org/abs/gr-qc/9909087v2. 
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fined in certain cases. Specifically, when one wants to discuss grav- 
itational waves, it is usually possible to assume an asymptotically 
flat spacetime. In an asymptotically flat spacetime, there is a scalar 
mass-energy, called the ADM mass, that is conserved. In this re- 
stricted sense, we are assured that the books balance, and that the 
emission and absorption of gravitational waves really does mean the 
transmission of a fixed amount of energy. 


9.2.3 Expected properties 


To see what properties we should expect for gravitational radia- 
tion, first consider the reasoning that led to the construction of the 
Ricci and Einstein tensors. If a certain volume of space is filled with 
test particles, then the Ricci and Einstein tensors measure the ten- 
dency for this volume to “accelerate;” i.e., —& V/ dt? is a measure 
of the attraction of any mass lying inside the volume. A distant 
mass, however, will exert only tidal forces, which distort a region 
without changing its volume. This suggests that as a gravitational 
wave passes through a certain region of space, it should distort the 
shape of a given region, without changing its volume. 


When the idea of gravitational waves was first discussed, there 
was some skepticism about whether they represented an effect that 
was observable, even in principle. The most naive such doubt is of 
the same flavor as the one discussed in section 8.2.6 about the ob- 
servability of the universe’s expansion: if everything distorts, then 
don’t our meter-sticks distort as well, making it impossible to mea- 
sure the effect? The answer is the same as before in section 8.2.6; 
systems that are gravitationally or electromagnetically bound do 
not have their scales distorted by an amount equal to the change in 
the elements of the metric. 


A less naive reason to be skeptical about gravitational waves is 
that just because a metric looks oscillatory, that doesn’t mean its 
oscillatory behavior is observable. Consider the following example. 


1 
ds? = dt? — (1 ~ 10 sin c) da? — dy? — dz? 


The Christoffel symbols depend on derivatives of the form O0,95c, SO 
here the only nonvanishing Christoffel symbol is I”... It is then 
straightforward to check that the Riemann tensor R%,,, = OT“ a, — 
Oat? 4 + P% 0a, — 1% y-F°, vanishes by symmetry. Therefore this 
metric must really just be a flat-spacetime metric that has been 
subjected to a silly change of coordinates. 


Self-check: R vanishes, but [ doesn’t. Is there a reason for 
paying more attention to one or the other? 


To keep the curvature from vanishing, it looks like we need a 
metric in which the oscillation is not restricted to a single variable. 


f/As the gravitational wave 
propagates in the z direction, the 
metric oscillates in the x and y 
directions, preserving volume. 
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For example, the metric 
2 2 Le 23 2 2 2 
ds* = dt — (14 Gain) ae — dy* — dz 


does have nonvanishing curvature. In other words, it seems like 
we should be looking for transverse waves rather than longitudi- 
nal ones.!9 On the other hand, this metric cannot be a solution 
to the vacuum field equations, since it doesn’t preserve volume. It 
also stands still, whereas we expect that solutions to the field equa- 
tions should propagate at the velocity of light, at least for small 
amplitudes. These conclusions are self-consistent, because a wave’s 
polarization can only be constrained if it propagates at c (see p. 129). 


Based on what we’ve found out, the following seems like a metric 
that might have a fighting chance of representing a real gravitational 
wave: 


dy? 


2 
okeneaa 


ds? = dt? — (1+ Asin(z — t)) da? 


It is transverse, it propagates at c(= 1), and the fact that grz is the 
reciprocal of gy, makes it volume-conserving. The following Maxima 
program calculates its Einstein tensor: 


1 load(ctensor) ; 

2 ct_coords: [t,x,y,z]; 

3 Ilg:matrix([1,0,0,0], 

4 [0,-(1+A*sin(z-t)),0,0], 

5 [(0,0,-1/(1+A*sin(z-t)) ,0], 
6 [0,0,0,-1]); 

7 cmetric(); 
8 einstein(true) ; 


For a representative component of the Einstein tensor, we find 


A? cos?(z — t) 


Gu = 
. 2+ 4Asin(z — t) + 2A? sin?(z — t) 


For small values of A, we have |Gy| < A?/2. The vacuum field 
equations require Gy; = 0, so this isn’t an exact solution. But all 
the components of G, not just Gy, are of order A’, so this is an 
approximate solution to the equations. 


1A more careful treatment shows that longitudinal waves can always be in- 
terpreted as physically unobservable coordinate waves, in the limit of large dis- 
tances from the source. On the other hand, it is clear that no such prohibition 
against longitudinal waves could apply universally, because such a constraint 
can only be Lorentz-invariant if the wave propagates at c (see p. 129), whereas 
high-amplitude waves need not propagate at c. Longitudinal waves near the 
source are referred to as Type III solutions in a classification scheme due to 
Petrov. Transverse waves, which are what we could actually observe in practical 
experiments, are type N. 
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It is also straightforward to check that propagation at approx- 
imately c was a necessary feature. For example, if we replace the 
factors of sin(z—t) in the metric with sin(z — 2t), we get a Gzz that 
is of order unity, not of order A?. 


To prove that gravitational waves are an observable effect, we 
would like to be able to display a metric that (1) is an exact solution 
of the vacuum field equations; (2) is not merely a coordinate wave; 
and (3) carries momentum and energy. As late as 1936, Einstein and 
Rosen published a paper claiming that gravitational waves were a 
mathematical artifact, and did not actually exist.!! 


4 Some exact solutions 


In this section we study several examples of exact solutions to 
the field equations. Each of these can readily be shown not to be 
a mere coordinate wave, since in each case the Riemann tensor has 
nonzero elements. 


An exact solution Example: 1 
We’ve already seen, e.g., in the derivation of the Schwarzschild 
metric in section 6.2.4, that once we have an approximate solu- 
tion to the equations of general relativity, we may be able to finda 
series solution. Historically this approach was only used as a last 
resort, because the lack of computers made the calculations too 
complex to handle, and the tendency was to look for tricks that 
would make a closed-form solution possible. But today the series 
method has the advantage that any mere mortal can have some 
reasonable hope of success with it — and there is nothing more 
boring (or demoralizing) than laboriously learning someone else’s 
special trick that only works for a specific problem. In this exam- 
ple, we'll see that such an approach comes tantalizingly close 
to providing an exact, oscillatory plane wave solution to the field 
equations. 


Our best solution so far was of the form 


dy? 


ay" ase 
1+f ack 


ds* = dt? — (1+ f) dx? — 
where f = Asin(z — t). This doesn’t seem likely to be an exact 
solution for large amplitudes, since the x and y coordinates are 
treated asymmetrically. In the extreme case of |A| > 1, there 
would be singularities in gyy, but not in gx. Clearly the metric will 
have to have some kind of nonlinear dependence on f, but we just 
haven’t found quite the right nonlinear dependence. Suppose we 
try something of this form: 


ds? = di? — (1 a ae cf?) dx? — (1 z f+ df?) dy? — dz? 


"Some of the history is related at http: //en.wikipedia. org/wiki/Sticky_ 
bead_argument. 
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This approximately conserves volume, since (1+f+...)(1—f+...) 
equals unity, up to terms of order f*. The following program tests 
this form. 


load(ctensor) ; 

ct-coords: [t.x,yo2] > 

f : Axexp(/ixk*(z-t)) ; 

lg:matrix([1,0,0,0], 
[0,-(1+£+c*£72) ,0,0], 
[0,0,-(1-f+d*f~2) ,0], 
[0,0,0,-1]); 

cmetric(); 

einstein(true) ; 


OAOANOoKWNH 


In line 3, the motivation for using the complex exponential rather 
than a sine wave in f is the usual one of obtaining simpler ex- 
pressions; as we’ll see, this ends up causing problems. In lines 5 
and 6, the symbols c and d have not been defined, and have not 
been declared as depending on other variables, so Maxima treats 
them as unknown constants. The result is Git ~ (4d + 4c — 3)A? 
for small A, so we can make the A? term disappear by an appro- 
priate choice of d and c. For symmetry, we choose c = d = 3/8. 
With these values of the constants, the result for Gy is of order 
A‘. This technique can be extended to higher and higher orders 
of approximation, resulting in an exact series solution to the field 
equations. 


Unfortunately, the whole story ends up being too good to be true. 
The resulting metric has complex-valued elements. If general rel- 
ativity were a linear field theory, then we could apply the usual 
technique of forming linear combinations of expressions of the 
form et’ and e~', so as to give a real result. But the field 
equations of general relativity are nonlinear, so the resulting lin- 
ear combination is no longer a solution. The best we can do is to 
make a non-oscillatory real exponential solution (problem 3). 


An exact, oscillatory, non-monochromatic solution Example: 2 
Assume a metric of the form 


ds ed plz =) de —alz— 1) dy* dz"; 


where p and q are arbitrary functions. Such a metric would clearly 
represent some kind of transverse-polarized plane wave traveling 
at velocity c(= 1) in the z direction. The following Maxima code 
calculates its Einstein tensor. 


load(ctensor) ; 
ct_coords: [t,x,y,z]; 
depends (p, [z,t]); 
depends (q, [z,t]); 


rwnNne 
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lg:matrix([1,0,0,0], 
[0,-p*2,0,0], 
[0,0,-q*2,0], 
[0,0,0,-1]); 

cmetric(); 

10 einstein(true) ; 


Oo OND ow 


The result is proportional to G/q + 6/p, so any functions p and 
q that satisfy the differential equation g/q + 6/p = 0 will result 
in a solution to the field equations. Setting p(u) = 1 + Acos uy, for 
example, we find that g is oscillatory, but with a period longer than 
27 (problem 4). 


An exact, plane, monochromatic wave 
Any metric of the form 


Example: 3 


ds* = (1 — h) dt? — dx? — dy? — (1+ h)dz* + 2hdzdt, 


where h = f(z — t)xy, and f is any function, is an exact solution of 
the field equations (problem 5). 


Because / is proportional to xy, this does not appear at first 
glance to be a uniform plane wave. One can verify, however, that 
all the components of the Riemann tensor depend only on z — f, 
not on x or y. Therefore there is no measurable property of this 
metric that varies with x and y. 


9.2.5 Rate of radiation 


How can we find the rate of gravitational radiation from a system 
such as the Hulse-Taylor pulsar? 


Let’s proceed by analogy. The simplest source of sound waves 
is something like the cone of a stereo speaker. Since typical sound 
waves have wavelengths measured in meters, the entire speaker is 
generally small compared to the wavelength. The speaker cone is 
a surface of oscillating displacement x = x osinwt. Idealizing such 
a source to a radially pulsating spherical surface, we have an oscil- 
lating monopole that radiates sound waves uniformly in all direc- 
tions. To find the power radiated, we note that the velocity of the 
source-surface is proportional to x,w, so the kinetic energy of the air 


immediately in contact with it is proportional to w?x2. The power 


radiated is therefore proportional to w?2?. 


In electromagnetism, conservation of charge forbids the existence 
of an oscillating electric monopole. The simplest radiating source is 
therefore an oscillating electric dipole D = D, sinwt. If the dipole’s 
physical size is small compared to a wavelength of the radiation, 
then the radiation is an inefficient process; at any point in space, 
there is only a small difference in path length between the positive 
and negative portions of the dipole, so there tends to be strong 
cancellation of their contributions, which were emitted with opposite 


(es P~w? 


monopole 


(SJ 


quadrupole 


g/The power emitted by a 
multipole source of order m is 
proportional to ww"), when 
the size of the source is small 
compared to the wavelength. The 
main reason for the w depen- 
dence is that at low frequencies, 
the wavelength is long, so the 
number of wavelengths traveled 
to a particular point in space is 
nearly the same from any point 
in the source; we therefore get 
strong cancellation. 
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phases. The result is that the wave’s electromagnetic potential four- 
vector (section 4.2.5) is proportional to Dow, the fields to Dow”, and 
the radiated power to D2w*. The factor of w* can be broken down 
into (w”)(w?), where the first factor of w? occurs for reasons similar 
to the ones that explain the w? factor for the monopole radiation 
of sound, while the second w? arises because the smaller w is, the 
longer the wavelength, and the greater the inefficiency in radiation 
caused by the small size of the source compared to the wavelength. 


AM radio Example: 4 
Commercial AM radio uses wavelengths of several hundred me- 
ters, so AM dipole antennas are usually orders of magnitude 
shorter than a wavelength. This causes severe attentuation in 
both transmission and reception. (There are theorems called 
reciprocity theorems that relate efficiency of transmission to ef- 
ficiency of reception.) Receivers therefore need to use of a large 
amount of amplification. This doesn’t cause problems, because 
the ambient sources of RF noise are attenuated by the short an- 
tenna just as severely as the signal. 


Since our universe doesn’t seem to have particles with negative 
mass, we can’t form a gravitational dipole by putting positive and 
negative masses on opposite ends of a stick — and furthermore, 
such a stick will not spin freely about its center, because its center 
of mass does not lie at its center! In a more realistic system, such as 
the Hulse-Taylor pulsar, we have two unequal masses orbiting about 
their common center of mass. By conservation of momentum, the 
mass dipole moment of such a system is constant, so we cannot have 
an oscillating mass dipole. The simplest source of gravitational ra- 
diation is therefore an oscillating mass quadrupole, Q = Qo sinwt. 
As in the case of the oscillating electric dipole, the radiation is sup- 
pressed if, as is usually the case, the source is small compared to 
the wavelength. The suppression is even stronger in the case of a 
quadrupole, and the result is that the radiated power is proportional 
to Q2w®. 


This result has the interesting property of being invariant un- 
der a rescaling of coordinates. In geometrized units, mass, dis- 
tance, and time all have the same units, so that Q? has units 
of (length*)? while w® has units of (length)~®. This is exactly 
what is required, because in geometrized units, power is unitless, 
energy /time = length/length = 1. 


We can also tie the w® dependence to our earlier argument, on 
p. 375, for the dissipation of energy by gravitational waves. The 
argument was that gravitating bodies are subject to time-delayed 
gravitational forces, with the result that orbits tend to decay. This 
argument only works if the forces are time-varying; if the forces 
are constant over time, then the time delay has no effect. For ex- 
ample, in the semi-Newtonian limit the field of a sheet of mass is 
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independent of distance from the sheet. (The electrical analog of 
this fact is easily proved using Gauss’s law.) If two parallel sheets 
fall toward one another, then neither is subject to a time-varying 
force, so there will be no radiation. In general, we expect that there 
will be no gravitational radiation from a particle unless the third 
derivative of its position @ a/ dt? is nonzero. (The same is true for 
electric quadrupole radiation.) In the special case where the position 
oscillates sinusoidally, the chain rule tells us that taking the third 
derivative is equivalent to bringing out a factor of w*. Since the 
amplitude of gravitational waves is proportional to @ «/ dt, their 


energy varies as (@ a/ dt?)?, or w®. 


The general pattern we have observed is that for multipole radi- 
ation of order m (0O=monopole, 1=dipole, 2=quadrupole), the radi- 
ated power depends on w2(+)), Since gravitational radiation must 
always have m = 2 or higher, we have the very steep w® depen- 
dence of power on frequency. This demonstrates that if we want 
to see strong gravitational radiation, we need to look at systems 
that are oscillating extremely rapidly. For a binary system with 
unequal masses of order m, with orbits having radii of order r, we 
have Qo ~ mr?. Newton’s laws give w ~ m!/2r—3/2, which is essen- 
tially Kepler’s law of periods. The result is that the radiated power 
should depend on (m/r)°. Reinserting the proper constants to give 
an equation that allows practical calculation in SI units, we have 


where & is a unitless constant of order unity. 


For the Hulse-Taylor pulsar,!? we have m ~ 3 x 102° kg (about 
one and a half solar masses) and r ~ 10? m. The binary pulsar is 
made to order our purposes, since m/r is extremely large compared 
to what one sees in almost any other astronomical system. The 
resulting estimate for the power is about 1074 watts. 


The pulsar’s period is observed to be steadily lengthening at a 
rate of « = 2.418 x 10~!? seconds per second. To compare this with 
our crude theoretical estimate, we take the Newtonian energy of the 
system Gm?/r and multiply by wa, giving 10° W, which checks to 
within an order of magnitude. A full general-relativistic calculation 
reproduces the observed value of a to within the 0.1% error bars of 
the data. 


During the process of orbital decay for two black holes, the ec- 
centricity of the orbit is reduced, and the orbit tends to become 
nearly circular by the time the holes merge.!? When one or more 
of the objects is not a black hole, there can also be complicated 
coupling to the dynamics of the body.'4 


nttp://arxiv.org/abs/astro-ph/0407149 
'SHinder et al., arxiv.org/abs/0710.5167 
M4TVanov and Papaloizou, arxiv.org/abs/0709.0480 
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Problems 


1 (a) Suppose that a school bus is rotating end over end, and 
therefore emitting gravitational waves. Estimate the frequency at 
which it must rotate, in revolutions per minute, if the power emitted 
is to be 1 pW. 

(b) The power emitted by gravitational waves depends very strongly 
on the frequency, and atomic nuclei are the fastest-rotating objects 
in the universe. Let’s estimate the probability, in a favorable case, 
that a nucleus in an excited rotational state will deexcite not by 
emitting a gamma ray but by emitting gravitational radiation. A 
typical nucleus has an atomic mass of 100, is about 5 fm in ra- 
dius, and has excited states with excitation energies on the order 
of 1 MeV. For a rotational state, this energy can be equated semi- 
classically to hw. Although most excited nuclear states decay with 
half-lives of picoseconds or less, excited states are known in which, 
due to various approximate selection rules, the half-life is on the 
order of a year. We assume that the nucleus is nonspherical, as is 
often the case — quantum-mechanically, a sphere cannot rotate, and 
relativistically, a rotating sphere will not emit gravitational waves. 

> Solution, p. 402 


2 (a) Starting on page 21, we have associated geodesics with 
the world-lines of low-mass objects (test particles). Use the Hulse- 
Taylor pulsar as an example to show that the assumption of low 
mass was a necessary one. How is this similar to the issues encoun- 
tered on pp. 39ff involving charged particles? 

(b) Show that if low-mass, uncharged particles did not follow geodesics 
(in a spacetime with no ambient electromagnetic fields), it would vi- 
olate Lorentz invariance. Make sure that your argument explicitly 
invokes the low mass and the lack of charge, because otherwise your 


argument is wrong. > Solution, p. 402 
3 Show that the metric ds? = dt? — Adx? — Bdy? — dz? with 
3 25 15211 
A=l 2 3 5 
ay af i167 107294797 
3 25 15211 
B=-14 2 3 5 
p+ sf = m6! 10729472 


f= Ack(t-2) 


is an approximate solution to the vacuum field equations, provided 
that k is real — which prevents this from being a physically realistic, 
oscillating wave. Find the next nonvanishing term in each series. 


4 Verify the claims made in example 2. Characterize the (some- 
what complex) behavior of the function q obtained when p(u) = 
1+ Acosu. 


5 Verify the claims made in example 3 using Maxima. Although 
the result holds for any function f, you may find it more convenient 
to use some specific form of f, such as a sine wave, so that Maxima 
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will be able to simplify the result to zero at the end. Note that 
when the metric is expressed in terms of the line element, there is a 
factor of 2 in the 2h dz dt term, but when expressing it as a matrix, 
the 2 is not present in the matrix elements, because there are two 
elements in the matrix that each contribute an equal amount. 


Problems 


385 


386 


Appendix 1: Hints and solutions 


Hints 


Hints for Chapter 1 
Page 38, problem 5: Apply the equivalence principle. 


Solutions to Selected Homework Problems 
Solutions for Chapter 1 


Page 38, problem 3: 


Pick two points Pl and P2. By O2, there is another point P3 that is distinct from P1 and 
P2. (Recall that the notation [ABC] was defined so that all three points must be distinct.) 
Applying O2 again, there must be a further point P4 out beyond P3, and by O3 this can’t be 
the same as P1. Continuing in this way, we can produce as many points as there are integers. 


Page 38, problem 4: 


(a) If the violation of (1) is tiny, then of course Kip won’t really have any practical way to 
violate (2), but the idea here is just to illustrate the idea, so to make things easy, let’s imagine an 
unrealistically large violation of (1). Suppose that neutrons have about the same inertial mass as 
protons, but zero gravitational mass, in extreme violation of (1). This implies that neutron-rich 
elements like uranium would have a much lower gravitational acceleration on earth than ones 
like oxygen that are roughly 50-50 mixtures of neutrons and protons. Let’s also simplify by 
making a second unrealistically extreme assumption: let’s say that Kip has a keychain in his 
pocket made of neutronium, a substance composed of pure neutrons. On earth, the keychain 
hovers in mid-air. Now he can release his keychain in the prison cell. If he’s on a planet, it 
will hover. If he’s in an accelerating spaceship, then the keychain will follow Newton’s first law 
(its tendency to do so being measured by its nonzero inertial mass), while the deck of the ship 
accelerates up to hit it. 


(b) It violates O1. O1 says that objects prepared in identical inertial states (as defined by 
two successive events in their motion) are predicted to have identical motion in the future. This 
fails in the case where Kip releases the neutronium keychain side by side with a penny. 


Page 38, problem 5: By the equivalence principle, we can adopt a frame tied to the tossed 
clock, B, and in this frame there is no gravitational field. We see a desk and clock A go by. 
The desk applies a force to clock A, decelerating it and then reaccelerating it so that it comes 
back. We’ve already established that the effect of motion is to slow down time, so clock A reads 
a smaller time interval. 


Page 38, problem 6: (a) Generalizing the expression gy/c? for the fractional time dilation to 
the case of a nonuniform field, we find @/c?, where ® is the Newtonian gravitational potential, 
i.e., the gravitational energy per unit mass. The shell theorem gives a gravitational field g = 
Mr/R?®. Integration shows 6 = Mr?/2R°. The difference in the gravitational potential between 
these two points, divided by c?, is ®/c? = M/2c?R, which comes out to be 3.5x 1071!°. This is the 
fractional difference in clock rates. (b) The probe’s velocity is on the same order of magnitude 
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as escape velocity from the inner solar system, so very roughly we can say that v ~ ,/|A®|, 
where A@® is the difference between the gravitational potential at the earth’s orbit and infinity. 
This gives a Doppler shift v/c ~ \/|A®|/c. We saw in part a that the gravitational Doppler 
shift was A®/c?, which is the square of this quantity, and therefore much smaller. 


Page 39, problem 7: (a) In case 1 there is no source of energy, so the particle cannot radiate. 
In case 2-4, the particle radiates, because there are sources of energy (loss of gravitational energy 
in 2 and 3, the rocket fuel in 4). 


(b) In 1, Newton says the object is subject to zero net force, so its motion is inertial. In 
2-4, he says the object is subject to a nonvanishing net force, so its motion is noninertial. This 
matches up with the results of the energy analysis. 


(c) The equivalence principle, as discussed on page 39, is vague, and is particularly difficult 
to apply successful and unambiguously to situations involving electrically charged objects, due 
to the difficulty of defining locality. Applying the equivalence principle in the most naive way, 
we predict that there can be no radiation in cases 2 and 3 (because the object is following a 
geodesic, minding its own business). In case 4, everyone agrees that there will be radiation 
observable back on earth (although it’s possible that it would not be observable to an observer 
momentarily matching velocities with the rocket). The naive equivalence principle says that 1 
and 4 must give the same result, so we should have radiation in 1 as well. These predictions are 
wrong in two out of the four equations, which tells us that we had better either not apply the 
equivalence principle to charged objects, or not apply it in such a naive way. 


Page 39, problem 8: 


(a) The dominant form of radiation from the orbiting charge will be the lowest-order non- 
vanishing multipole, which in this case is a dipole. The power radiated from a dipole scales like 
d?w*, where d is the dipole moment. For an orbit of radius r, this becomes q?r?w*. To find 
the reaction force on the charged particle, we can use the relation p = E’/c for electromagnetic 
waves (section 1.5.5), which tells us that the force is equal to the power, up to a proportionality 
constant c. Therefore a, « q?r?w*/m. The gravitational acceleration is ag = wr, so we have 
ar /dg x (q?/m)w?r, or ar/ag x (q?/m)ag, where the ag on the right can be taken as an orbital 
parameter, and for a low-earth orbit is very nearly equal to the usual acceleration of gravity at 
the earth’s surface. 


(b) In SI units, a,;/ag ~ (k/c*)(q?/m)ag, where k is the Coulomb constant. 


(c) The result is 10~%*. If one tried to do this experiment in reality, the effect would be 
impossible to detect, because the proton would be affected much more strongly by ambient 
electric and magnetic fields than by the effect we’ve calculated. 


Remark: It is odd that the result depends on q?/m, rather than on the charge-to-mass ratio 
q/m, as is usually the case for a test particle’s trajectory. This means that we get a different 
answer if we take two identical objects, place them side by side, and consider them as one big 
object! This is not as unphysical as it sounds. The two side-by-side objects radiate coherently, 
so the field they radiate is doubled, and the radiated power is quadrupled. Each object’s rate 
of orbital decay is doubled, with the extra effect coming from electromagnetic interactions with 
the other object’s fields. 
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Page 83, problem 1: 


(a) Let ¢ be the time taken in the lab frame for the light to go from one mirror to the other, 
and t’ the corresponding interval in the clock’s frame. Then t’ = L, and (vt)?+L? = t?, where the 
use of the same L in both equations makes use of our prior knowledge that there is no transverse 
length contraction. Eliminating L, we find the expected expression for y, which is independent 
of L (b) If the result of a were independent of L, then the relativistic time dilation would depend 
on the details of the construction of the clock measuring the time dilation. We would be forced 
to abandon the geometrical interpretation of special relativity. (c) The effect is to replace vt with 
vt +at?/2 as the quantity inside the parentheses in the expression (...)?-+L? = t?. The resulting 
correction terms are of higher order in ¢ than the ones appearing in the original expression, and 
can therefore be made as small in relative size as desired by shortening the time ¢t. But this is 
exactly what happens when we make the clock sufficiently small. 


Page 84, problem 2: 


Since gravitational redshifts can be interpreted as gravitational time dilations, the gravi- 
tational time dilation is given by the difference in gravitational potential gdr (in units where 
c=1). The kinematic effect is given by dy = dv?)/2 = w?rdr. The ratio of the two effects is 
w*RcosA/g, where R is the radius of the Earth and ) is the latitude. Tokyo is at 36 degrees 
latitude, and plugging this in gives the claimed result. 


Page 84, problem 3: 


(a) Reinterpret figure j on p. 81 as a picture of a Sagnac ring interferometer. Let light waves 
1 and 2 move around the loop in opposite senses. Wave 1 takes time ¢;; to move inward along 
the crack, and time t, to come back out. Wave 2 takes times tg; and tg,. But ty; = ta; (since 
the two world-lines are identical), and similarly t1, = t29. Therefore creating the crack has no 
effect on the interference between 1 and 2, and splitting the big loop into two smaller loops 
merely splits the total phase shift between them. (b) For a circular loop of radius r, the time 
of flight of each wave is proportional to r, and in this time, each point on the circumference 
of the rotating interferometer travels a distance v(time) = (wr)(time) « r?. (c) The effect is 
proportional to area, and the area is zero. (d) The light clock in c has its two ends synchronized 
according to the Einstein prescription, and the success of this synchronization verifies Einstein’s 
assumption of commutativity in this particular case. If we make a Sagnac interferometer in the 
shape of a triangle, then the Sagnac effect measures the failure of Einstein’s assumption that all 
three corners can be synchronized with one another. 


Page 84, problem 5: 
Here is the program: 
Li:matrix([cosh(hi) ,sinh(hi)],[sinh(h1),cosh(h1)]); 
L2:matrix([cosh(h2) ,sinh(h2)], [sinh(h2) ,cosh(h2)]); 


T:L1.L2; 
taylor (taylor(T,h1,0,2) ,h2,0,2); 


The diagonal components of the result are both 1 + ?/2 + 73/2 + mn +... Everything after 
the 1 is nonclassical. The off-diagonal components are 1 + 2 + min3/2+ nn? /2+..., with the 
third-order terms being nonclassical. 


Solutions for Chapter 3 
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Page 120, problem 1: 


(a) As discussed in example 5 on page 96, a cylinder has local, intrinsic properties identical 
to those of flat space. The cylindrical model therefore has the same properties L1-L5 as our 
standard model of Lorentzian space, provided that L1-L5 are taken as purely local statements. 


(b) The cylindrical model does violate L3. In this model, the doubly-intersecting world-lines 
described by property G will not occur if the world-lines are oriented exactly parallel to the 
cylinder. This picks out a preferred direction in space, violating L3 if L3 is interpreted globally. 
Frames moving parallel to the axis have different properties from frames moving perpendicular 
to the axis. 


But just because this particular model violates the global interpretation of L3, that doesn’t 
mean that all models of G violate it. We could instead construct a model in which space wraps 
around in every direction. In the 2+1-dimensional case, we can visualize the spatial part of such 
a model as the surface of a doughnut embedded in three-space, with the caveat that we don’t 
want to think of the doughnut hole’s circumference as being shorter than the doughnut’s outer 
radius. Giving up the idea of a visualizable model embedded in a higher-dimensional space, 
we can simply take a three-dimensional cube and identify its opposite faces. Does this model 
violate L3? It’s not quite as obvious, but actually it does. The spacelike great-circle geodesics of 
this model come in different circumferences, with the shortest being those parallel to the cube’s 
axes. 


We can’t prove by constructing a finite number of models that every possible model of G 
violates L3. The two models we’ve found, however, can make us suspect that this is true, and 
can give us insight into how to prove it. For any pair of world-lines that provide an example of 
G, we can fix a coordinate system K in which the two particles started out at A by flying off 
back-to-back. In this coordinate system, we can measure the sum of the distances traversed by 
the two particles from A to B. (If homegeneity, L1, holds, then they make equal contributions 
to this sum.) The fact that the world-lines were traversed by material particles means that we 
can, at least in principle, visit every point on them and measure the total distance using rigid 
rulers. We call this the circumference of the great circle AB, as measured in a particular frame. 
The set of all such circumferences has some greatest lower bound. If this bound is zero, then 
such geodesics can exist locally, and this would violate even the local interpretation of L1-L5. 
If the bound is nonzero, then let’s fix a circle that has this minimum circumference. Mark the 
spatial points this circle passes through, in the frame of reference defined above. This set of 
points is a spacelike circle of minimum radius. Near a given point on the circle, the circle looks 
like a perfectly straight axis, whose orientation is presumably random. Now let some observer 
k’ travel around this circle at a velocity v relative to K, measuring the circumference with a 
Lorentz-contracted ruler. The circumference is greater than the minimal one measured by K. 
Therefore for any axis with a randomly chosen orientation, we have a preferred rest frame in 
which the corresponding great circle has minimum circumference. This violates L3. Thanks to 
physicsforums user atyy for suggesting this argument. 


More detailed discussions of these issues are given in Bansal et al., arxiv.org/abs/gr-qc/ 
0503070v1, and Barrow and Levin, arxiv.org/abs/gr-qc/0101014v1. 


Page 120, problem 2: 


In these Cartesian coordinates, the metric is diagonal and has elements with opposite signs. 
Due to the SI units, it is not possible for the two nonzero elements of the metric to have the 


same units. Let’s arbitrarily fix gj, = 1. Then we must have gz; = c~?. Using the metric to 
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lower the index on ds, we find dsg = (ds‘, c~? ds*). 
Page 121, problem 7: 


According to the Einstein summation convention, the repeated index implies a sum, so the 
result is a scalar. As shown in example 15 on p. 107, each term in the sum equals 1, so the 
result is unitless and simply equals the number of dimensions. 


Page 120, problem 3: 


(a) The first two violate the rule that summation only occurs over up-down pairs of indices. 
The third expression would result in a quantity that couldn’t be classified as either contravariant 
or covariant. (b) In differential geometry, different elements of the same tensor can have different 
units. Since, as remarked in the problem, Ugq were to be interpreted as a sum, this mean 
adding things that had different units. In the expression p* — qa, even if we suppose that 
p and q both represent the same type of physical quantity, e.g., force, their covariant and 
contravariant versions would not necessarily have the same units unless we happened to be 
working in coordinates such that the metric was unitless. 


Page 120, problem 4: 


Assuming the mountaineer uses radians and the metric system, the coordinates have units 
1, 1, and m (where 1 means a unitless quantity and m means meters — radians are not really 
units). Therefore the units of an infinitesimal difference in coordinates ds® are also (1, 1,m). 
Because the coordinates are orthogonal, the metric is diagonal. If we want gqy ds“ ds? to have 
units of m?, then its diagonal elements must have units of (m?,m?,1). The upper-index metric 
g® is the inverse of its lower-index version gap, so its units are (m~?,m7~?,1). Mechanical work 
has units of N-m, so given dW = F, ds", the units of F, must be (N-m,N-m,N). Raising the 
index on the force using g” gives (N/m, N/m, N). 


Page 120, problem 5: 


The only aspect of the geometrical representation that needs to be changed is that instead of 
representing an upper-index vector using a pair of parallel lines, we should use a pair of parallel 
planes. 


Page 121, problem 8: 


The coordinate T would have a discontinuity of 2twr?/(1 — w?r). Reinserting factors of c 
to make it work out in SI units, we have 2mwr?c~?/(1 — w2r?c~?) = 207 ns. The exact error in 
position that would result is dependent on the geometry of the current position of the satellites, 
but it would be on the order of cAT, which is ~ 100 m. This is considerably worse than civilian 


GPS’s 20-meter error bars. 
Page 121, problem 9: 


The process that led from the Euclidean metric of example 8 on page 103 to the non- 
Euclidean one of equation [3] on page 112 was not just a series of coordinate transformations. 
At the final step, we got rid of the variable t, reducing the number of dimensions by one. 
Similarly, we could take a Euclidean three-dimensional space and eliminate all the points except 
for the ones on the surface of the unit sphere; the geometry of the embedded sphere is non- 
Euclidean, because we’ve redefined geodesics to be lines that are “as straight as they can be” 
(i.e., have minimum length) while restricted to the sphere. In the example of the carousel, the 
final step effectively redefines geodesics so that they have minimal length as determined by a 
chain of radar measurements. 
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Page 121, problem 10: 


(a) No. The track is straight in the lab frame, but curved in the rotating frame. Since the 
spatial metric in the rotating frame is symmetric with respect to clockwise and counterclockwise, 
the metric can never result in geodesics with a specific handedness. (b) The d6” term of the 
metric blows up here. A geodesic connecting point A, at r = 1/w, with point B, at r < 1/w, 
must have minimum length. This requires that the geodesic be directly radial at A, so that 
dé’ = 0; for if not, then we could vary the curve slightly so as to reduce | d6’|, and the resulting 
increase in the dr? term would be negligible compared to the decrease in the d6” term. (c) No. 
As we found in part a, laser beams can’t be used to form geodesics. 


Page 122, problem 11: A and B are equivalent under a Lorentz transformation, so the Penrose 
result clearly includes B. The outline of the sphere is still spherical. C is also equivalent to A 
and B, because there are only two effects (Lorentz contraction and optical aberration), and both 
of them depend only on the observer’s instantaneous velocity, not on his history of motion. D 
is not a well-defined question. When asking this question, we’re implicitly assuming that the 
sphere has some “real” shape, which appears different because the sphere has been set into 
motion. But you can’t impart an angular acceleration to a perfectly rigid body in relativity. 


Page 122, problem 12: Applying the de Broglie relations to the relativistic identity m? = 
E? — p*, we find the dispersion relation to be m? = w? — k?. The group velocity is dw/dk = 
\/1— (m/w). Applying the de Broglie relations to this, and associating the group velocity 
with v, we have v = \/1— (m/E)?, which is equivalent to EF = my. Since EF = my has been 
established, and m? = E? — p? was assumed, it follows immediately that p = myv holds as well. 
All hell breaks loose if we try to associate v with the phase velocity, which is w/k = \/1+ (m/k)?. 
For example, the phase velocity is always greater than c(= 1) for m > 0. 


Solutions for Chapter 4 


Page 156, problem 1: 


The four-velocity of a photon (or of any massless particle) is undefined. One way to see this 
is that dr = 0 for a massless particle, so v’ = da’/ dr involves division by zero. Alternatively, 
py’ = mv' would always give an energy and momentum of zero if v’ were well defined, yet we 
know that massless particles can have both energy and momentum. 


Page 156, problem 2: 


To avoid loss of precision in numerical operations like subtracting v from 1, it’s better 
to derive an ultrarelativistic approximation. The velocity corresponding to a given y is v = 
J1l—y-? = 1- 1/27’, so 1 —v & 1/277 = (m/E)*/2. Reinserting factors of c so as to make 
the units come out right in the SI system, this becomes (mc?/E)?/2 = 9 x 107%. 


Page 156, problem 8: 


The time on the clock is given by s = [ ds, where the integral is over the clock’s world-line. 
The quantity ds is our prototypical Lorentz scalar, so it’s frame-independent. An integral is 
just a sum, and the tensor transformation laws are linear, so the integral of a Lorentz scalar is 
still a Lorentz scalar. Therefore s is frame-independent. There is no requirement that we use an 
inertial frame. It would also work fine, for example, in a frame rotating with the earth. We don’t 
even need to have a frame of reference. All of the above applies equally well to any coordinate 
system at all, even one that doesn’t have any sensible interpretation as some observer’s frame 
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of reference. 
Page 156, problem 10: 


Such a transformation would take an energy-momentum four-vector (F,p), with E > 0, to 
a different four-vector (E’,p’), with E’ < 0. That transformation would also have the effect of 
transforming a timelike displacement vector from the future light cone to the past light cone. But 
the Lorentz transformations were specifically constructed so as to preserve causality (property 
L5 on p. 51), so this can’t happen. 


Page 156, problem 11: 


A spatial plane is determined by the light’s direction of propagation and the relative ve- 
locity of the source and observer, so the 3+1 case reduces without loss of generality to 2+1 
dimensions. The frequency four-vector must be lightlike, so its most general possible form 
is (f, f cos@, f sin@), where @ is interpreted as the angle between the direction of propaga- 
tion and the relative velocity. Putting this through a Lorentz boost along the x axis, we find 
f' =yf(1+vcos0), which agrees with Einstein’s equation on page ??, except for the arbitrary 
convention involved in defining the sign of v. 


Page 157, problem 12: 


The exact result depends on how one assumes the charge is distributed, so this can’t be any 
more than a rough estimate. The energy density is (1/87k)E? ~ ke?/r*, so the total energy 
is an integral of the form Ne ss dV ~ ds dr, which diverges like 1/r as the lower limit of 
integration approaches zero. This tells us that most of the energy is at small values of r, so to 
a rough approximation we can just take the volume of integration to be r? and multiply by a 
fixed energy density of ke?/r*. This gives an energy of ~ ke?/r. Setting this equal to mc? and 
solving for r, we find r ~ ke?/mc? ~ 107! m. 


Remark: Since experiments have shown that electrons do not have internal structure on this 
scale, we conclude that quantum-mechanical effects must prevent the energy from blowing up 
as r — 0. 


Page 157, problem 17: 
Doing a transformation first by u and then by v results in E” = E—v x (ux E)+(u+v) xB. 


This is not of the same form, because if B = 0, we can have E” 4 E. 


Solutions for Chapter 5 


Page 209, problem 1: The equation for the Christoffel symbols in terms of the metric was 


1 
My = 5a (OaGod + Ob9ad — OaGab) - 


Because both the metric matrix and its inverse appear, we get factors of a and 1/a that cancel 
out. Therefore there is no effect on the Christoffel symbols or on the geodesics. This certainly 
makes sense in the case of a = —1, because this is just a change in the choice of signature, which 
is an arbitrary convention. It also makes sense that rescaling the metric by a nonzero positive 
factor has no effect on the geodesics — we would expect this to change the measurement of 
geodesics, but we would not expect it to make different curves be geodesics. 


Page 209, problem 3: 
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The inverse metric has g’ = t and g®’ = —1/t. The nonvanishing symbols are: 
t 1 itt 
Pe = 99 (Orgee + Ogee — Oger) = —1/2t 
1 
T99 = 59 (A900) =t/2 


1 
ihe = 16 = 59°" (A900) = 1/2t 


Page 209, problem 7: 
(a) Expanding in a Taylor series, they both have g, =1+2gz+... 


(b) This property holds for [2] automatically because of the way it was constructed. In [1], 
the nonvanishing Christoffel symbols (ignoring permutations of the lower indices) are I’,, = g 
and T’*,, = ge?9*. We can apply the geodesic equation with the affine parameter taken to be the 
proper time, and this gives 7 = —ge?9*i?, where dots represent differentiation with respect to 
proper time. For a particle instantaneously at rest, t = 1/./Gnt = e779 80'2 = 9G; 


(c) [2] was constructed by performing a change of coordinates on a flat-space metric, so it 
is flat. The Riemann tensor of [1] has R*,,, = —g’, so [1] isn’t flat. Therefore the two can’t be 
the same under a change of coordinates. 


(d) [2] is flat, so its curvature is constant. [1] has the property that under the transformation 
z — z+ c, where c is a constant, the only change is a rescaling of the time coordinate; by 
coordinate invariance, such a rescaling is unobservable. 


Page 210, problem 8: (a) 0<a<1 
(b)0<a<1l 
(c) 27 <2 


Page 210, problem 9: The double cone fails to satisfy axiom M2, because the apex has 
properties that differ topologically from those of other points: deleting it chops the space into 
two disconnected pieces. 


Page 210, problem 10: When we use a word like “torus,” there is some hidden ambiguity. We 
could mean something strange like the following. Suppose we construct the three-dimensional 
space of coordinates (x,y,z) in which all three coordinates are rational numbers. Then let a 
torus be the set of all such points lying at a distance of 1/2 from the nearest point on a unit 
circle. This is in some sense a torus, but it doesn’t have the topological properties one usually 
assumes. For example, two continuous curves on its surface can cross without having a point of 
intersection. We can’t get anywhere without assuming that the word “torus” refers to a surface 
that has the usual topological properties. 


Now let’s prove that it’s a manifold using both definitions. 


Using the topological definition, M1 is satisfied with n = 2, because every point on the surface 
lives in a two-dimensional neighborhood. M2 holds because the only differences between points 
are those that are not topological, e.g., Gaussian curvature. M3 holds due to the interpretation 
outlined in the first paragraph. 


Alternatively, we can use the local-coordinate definition. We have already shown that a 
circle is a 1-manifold, which can be coordinatized in two patches by an angle ¢. The torus can 
therefore be coordinatized by a pair of such angles, (¢), $2), in four patches. Again we need to 
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assume the interpretation given above, since otherwise real-number pairs like (¢1, 62) wouldn’t 
have the same topology as points on the rational-number torus. 


Page 210, problem 11: In the torus, we can construct a closed curve C that encircles the 
hole. If we have a homeomorphism, C must have an image C’ under that homeomorphism 
that is a closed curve in the sphere. C’ can then be contracted continuously to a point, and 
since the inverse of the homeomorphism is also continuous, it would be possible to contract C 
continuously to a point. But this is impossible because C encircles the hole. 


Page 210, problem 12: (a) The Christoffel symbols are (assuming I didn’t make a mistake 
in calculating them by hand) Tt, = (1/2)pt?-! and T%, = T%, = (1/2)pt~!. (b) After that, I 
resorted to a computer algebra system (Maxima), which told me that, for example, the Ricci 
tensor has Ry = (p/2 — p?/4)t~?. 


Page 211, problem 13: 


The answer to this is a little subtle, since it depends on how we take the limit. Suppose we 
join two planes with a section of a cylinder having radius p, and let p go to zero. The Gaussian 
curvature of a cylinder is zero, so in this limit we fail to reproduce the correct result. On the 
other hand, suppose we take a discus of radius p, whose edge has a curve of radius p2. in the 
limit p; > +00, po + 07, we can get either K = 1/(p1p2) — 0 or K — +00, depending on how 
quickly p; and p2 approach their limits. 


Page 211, problem 14: 


The definition of the proper time is dr? = dz“ dz,,. Dividing by dX? on both sides and using 
dots for differentiation with respect to A, we have 


This allows us to determine dr/dX up to a sign, and the sign can be easily determined by 
inspection of the solution. This determines the relation between 7 and X up to an additive 
constant. Alternatively, one could just normalize the velocity vector when setting the initial 
conditions. 


Solutions for Chapter 6 


Page 258, problem 3: (a) In the center of mass frame, symmetry guarantees that the test 
particle exits with a speed equal to the speed with which it entered, and the entry and exit 
velocities are v and —v. Now let’s switch to the sun’s frame. This involves adding u to all 
velocities, so the entry and exit velocities become v + u and —v + u. The difference in speed is 
2u. 


(b) The derivation assumed that velocities add linearly when you change frames of reference, 
which is a nonrelativistic approximation. Relativistically, velocities combine not like u+v but 
like (u+v)/(1+ uv). If you put in v = 1, the result for the combined velocity is always 1. 


This is a funny case where we can get the answer to a gravitational problem purely through 
special relativity. We might worry that the SR-based answer is wrong, because we really need 
GR for gravity. But we can get the same answer from GR, since GR says that a test particle 
always follows a geodesic, and a lightlike geodesic always remains lightlike. The reason SR 
worked is that an observer could watch a patch of flat space far away from the black hole, 
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observe a wave-packet of light passing through that patch on the way to the black hole, and 
then observe it again on the way back out. Since the patch is flat, SR works. 


Page 258, problem 4: (a) L = 0 by symmetry. The quantity E can be interpreted as the 
energy per unit mass that is added to the entire system by inserting the test particle. Since the 
test particle starts at rest and far away, the added energy is simply the mass of the particle, 
and £ = 1. 


(b) In the special case L = 0, F = 1, the general equation of motion for a test particle in a 
Schwarzschild spacetime becomes simply *? = 2m/r. Separating variables and integrating, we 
have r x s2/3, where the constant of integration is chosen to be zero. This clearly shows that 
we move from any given r to r= 0 in a finite proper time s. 


Page 258, problem 5: (a) For a displacement with d¢ = 0, we have ds? = gy dt?, so gu = 
/ds/dt = V3sintcost. For an azimuthal displacement, ds = ydd, so gag = Jy = sin?/? ¢. 

(b) At places on the surface of revolution corresponding to the cusps of the astroid, one or 
both of the lower-index elements of the metric go to zero, which means that the corresponding 
upper-index elements blow up. These are the sharp points of the surface at the x axis and the 
sharp edge at its waist. There are at least coordinate singularities there, but the question is 
whether they are intrinsic. The only intrinsic measure of curvature in two dimensions is the 
Gaussian curvature, which can be interpreted as (minus) the product of the curvatures along 
the two principal axes, here ky = —(2/3)csc2t and ky = 1/y = sin-?t. At the waist, both 
factors blow up, so the Gaussian curvature, which is intrinsic, blows up, and this is not just a 
coordinate singularity. The same thing happens at the tips. Interestingly, a geodesic that hits 
one of these singularities can still be traced through in a continuous way and extended onward 
such that its arc length remains finite. This property is called geodesic completeness. 


Page 259, problem 6: (a) There are singularities at r = 0, where ggg = 0, and r = 1/w, 
where gt = 0. These are considered singularities because the inverse of the metric blows up. 
They’re coordinate singularities, because they can be removed by a change of coordinates back 
to the original non-rotating frame. 

(b) This one has singularities in the same places. The one at r = 0 is a coordinate singularity, 
because at small r the w dependence is negligible, and the metric is simply that of ordinary 
plane polar coordinates in flat space. The one at r = 1/w is not a coordinate singularity. The 
following Maxima code calculates its scalar curvature R = R®%,, which is esentially just the 
Gaussian curvature, since this is a two-dimensional space. 


a? 


load(ctensor) ; 
dim:2; 
ct_coords: [r,theta]; 
lg:matrix([-1,0], 
[0 ,-r*2/ (1-w*2*r~2)]); 
cmetric(); 
ricci(true) ; 
scurvature(); 


The result is R = 6w?/(1 — 2w?r? + wrt). This blows up at r = 1/w, which shows that this is 
not a coordinate singularity. The fact that R does not blow up at r = 0 is consistent with our 
earlier conclusion that r = O is a coordinate singularity, but would not have been sufficient to 
prove that conclusion. 

(c) The argument is incorrect. The Gaussian curvature is not just proportional to the angular 
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deficit €, it is proportional to the limit of «/A, where A is the area of the triangle. The area of 
the triangle can be small, so there is no upper bound on the ratio €/A. Debunking the argument 
restores consistency with the answer to part b. 


Page 259, problem 10: The only nonvanishing Christoffel symbol is [%, = —1/2t. The 
antisymmetric treatment of the indices in R%,.4 = Oc ,— Oar? gt T cel ay Tae F, guarantees 
that the Riemann tensor must vanish when there is only one nonvanishing Christoffel symbol. 


Page 259, problem 11: The first thing one notices is that the equation Ray = k isn’t written 
according to the usual rules of grammar for tensor equations. The left-hand side has two lower 
indices, but the right-hand side has none. In the language of freshman physics, this is like 
setting a vector equal to a scalar. Suppose we interpret it as meaning that each of R’s 16 
components should equal & in a vacuum. But this still isn’t satisfactory, because it violates 
coordinate-independence. For example, suppose we are initially working with some coordinates 
x, and we then rescale all four of them according to 2” = 22. Then the components of Rap 
all scale down by a factor of 4. But this would violate the proposed field equation. 


Page 259, problem 12: The following Maxima code calculates the Ricci tensor for a metric 
with gz = h and g,, = k. 


1 load(ctensor) ; 

2 dim:3; 

3 ct_coords: [t,r,phi] ; 

4  depends(h,r); 

5 depends(k,r); 

6 Ilg:matrix([h,0,0], 

7 [0,-k,0], 

8 [0,0,-r*2]); 

9 cmetric(); 
ricci(true) ; 


Inspecting the output (not reproduced here), we see that Ryg = 0 requires k’/k = h'/h. Since 
the logarithmic derivatives of h and k& are the same, the two functions can differ by at most a 
constant factor c. So now we do a second iteration of the calculation: 


1 load(ctensor) ; 

2 dim:3; 

3 ct_coords: [t,r,phi] ; 

4 depends(h,r); 

5 lg:matrix([h,0,0], 

6 [0,-c*h,0], 
ie [0,0,-r*2]); 
8 cmetric(); 

9 ricci(true) ; 


The result for R,, is independent of c. Since h is essentially the gravitational potential, we have 
the requirements h’ > 0 (because gravity is attractive) and h” < 0 (because gravity weakens 
with distance). Therefore we find that R,, is positive, and we do not obtain a vacuum solution. 
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Page 260, problem 13: This idea is not well defined because it implicitly assumes that we 
can fix a global frame of reference. The notion of reversing velocity vectors (i.e., reversing 
the spacelike components of 4-velocities) implies that there are some velocity vectors whose 
spacelike parts are zero, so that they aren’t changed by a flip. This amounts to choosing a 
frame of reference. To be able to do the flip globally, you’d have to have some sensible notion 
of a global frame of reference, but we don’t necessarily have that. (In a spacetime with closed 
timelike curves, there is also the issue that we don’t have complete freedom to choose initial 
conditions on a spacelike surface, because these conditions might end up not being consistent 
with themselves when evolved around a CTC.) 


Page 260, problem 14: For a particle with zero or nonzero mass (i.e., a particle that is not 
a tachyon), the velocity vector must be either timelike or null. In the + — —— signature, this 
means that its norm must be greater than or equal to zero. Use units such that the mass of 
the black hole is 1/2, so that the Schwarzschild radius is 1, and let A = 1 — 1/r be the factor 
appearing in the Schwarzschild metric. Let the motion be in the plane 0 = 7/2, so that we 
can ignore @ as a coordinate. The norm of a velocity vector is then Af? — A~!72 — r2¢?, where 
the dots represent derivatives with respect to an affine parameter such as proper time. If the 
particle is to turn around at some point, then at that point we have 7 = 0. Then the only way 
to get a positive norm is if A > 0, which requires r > 1. 


Solutions for Chapter 7 


Page 290, problem 2: (a) If she makes herself stationary relative to the sun, she will still 
experience local geometrical changes because of the planets. (b) If it was to be impossible 
for her to prove the universe’s nonstationarity, then any world-line she picked would have to 
experience constant local geometrical conditions. A counterexample is any world-line extending 
back to the Big Bang, which is a singularity with drastically different conditions than any other 
region of spacetime. (c) To maintain a constant local geometry, she would have to “surf” the 
wave, but she can’t do that, because it propagates at the speed of light. (d) There are places 
where the local mass-energy density is increasing, and the field equations link this to a change 
in the local geometry. 


Page 290, problem 5: 


Under these special conditions, the geodesic equations become # = I",,t?, é=0,t=0, 
where the dots can in principle represent differentation with respect to any affine parameter we 
like, but we intend to use the proper time s. By symmetry, there will be no motion in the z 
direction. The Christoffel symbol equals —(1/2)e"(cos /3r — V3sin V3r). At a location where 
the cosine equals 1, this is simply —e’/2. For t, we have dt/ds = 1/./G#t = e~"/2. The result of 
the calculation is simply # = —1/2, which is independent of r. 


Page 291, problem 6: 


The Petrov metric is one example. The metric has no singularities anywhere, so the r 
coordinate can be extended from —oo to +00, and there is no point that can be considered the 
center. The existence of a dé dt term in the metric shows that it is not static. 


A simpler example is a spacetime made by taking a flat Lorentzian space and making it wrap 
around topologically into a cylinder, as in problem 1 on p. 120. As discussed in the solution 
to that problem, this spacetime has a preferred state of rest in the azimuthal direction. In 
a frame that is moving azimuthally relative to this state of rest, the Lorentz transformation 
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requires that the phase of clocks be adjusted linearly as a function of the azimuthal coordinate 
@. As described in section 3.5.4, this will cause a discontinuity once we wrap around by 27, and 
therefore clock synchronization fails, and this frame is not static. 


Page 291, problem 7: 


For an observer in a circular orbit at radius r, we can trivially tell that when r is large, the 
result is Newtonian, so the Doppler shifts will be small and will be both redshifts and blueshifts. 
I don’t know of any simple way to prove, without a calculation, that even at small radii there 
will be both redshifts and blueshifts. 


Let the units be such that the Schwarzschild radius is 1 (which means that the mass of the 
black hole is 1/2). Vectors are expressed in Schwarzschild coordinates (t,r,9,¢) The orbiting 
observer is in the plane 0 = 7/2. The + signs refer to the extreme cases of the orbiting observer 
detecting a ray of light from the forward direction and the backward direction. Solving the 
geodesic equation for a circular orbit, we find that the normalized velocity vector of the orbiting 
observer is 


3 \ 71/2 
u = (1 = =) (1:0,0,9- 4/792), 
r 


This expression misbehaves for r < 3/2; for radii that small, there are no circular orbits. 
(Circular orbits are also unstable for r < 3.) Let the velocity vector of the ray at detection, 
with an arbitrary choice of affine parameter, be 


vo’ = (1,0,0,£(1 — 1/r)¥/2r-4). 


The velocity vector of the distant observer emitting the ray is 
uw = (1, 0,0, 0): 


We would also like to extrapolate backward in time to find v, the velocity vector of the ray upon 
emission by the distant source. The complete vector probably can’t be found in closed form, 
but because there is a conserved energy, we can get the only component we need in closed form 
as 

v= (La lyrecsh: 
The Doppler shift is 


se ey ee (1 e i i 2r(1 —1/r)}2/?). 


Ww upv? 2r 


Graphing shows that all the way down to r = 3/2, one solution always has w’/w > 1 and one 
w/w <1. 


For large values of r, we can understand the leading-order behavior of this result in semi- 
Newtonian terms. The Newtonian orbital velocity is v = (2r)-1/ 2 which gives a special- 
relativistic longitudinal Doppler shift 1 + (2r)-V/ oe x +..., the gravitational time dilation 
is 1+ 5:+..., and the product of these is 1 = (2r)~1/? + 2 +..., in agreement with the exact 
expression up to order 1/r. 


It is also interesting to compare the maximum redshift D, for our observer in a circular 
orbit with the result of example 8, p. 267 for an observer infalling radially from rest at infinity, 
which is a maximum redshift for that observer. Call the latter D,. At large radii, D, is a bigger 
redshift, because the effect is semi-Newtonian, and the radially infalling observer has velocity 
that is higher by a factor of 2. But D, blows up at r = 3/2, while D, blows up only at r = 0. 
Therefore there is a point where the two curves cross, which turns out to be at r = 2. 
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Solutions for Chapter 8 


Page 367, problem 2: No. General relativity only allows coordinate transformations that are 
smooth and one-to-one (see p. 98). This transformation is not smooth at t = 0. 


Page 367, problem 5: (a) The Friedmann equations are 


a 1. 4r 
SEK 3P 
. 5 3 (0 + 3P) 


and 


The first equation is time-reversal invariant because the second derivative stays the same under 
time reversal. The second equation is also time-reversal invariant, because although the first 
derivative flips its sign under time reversal, it is squared. 

(b) We typically do not think of a singularity as being a point belonging to a manifold at all. If 
we want to create this type of connected, symmetric back-to-back solution, then we need the Big 
Bang singularity to be a point in the manifold. But this violates the definition of a manifold, 
because then the Big Bang point would have topological characteristics different from those of 
other points: deleting it separates the spacetime into two pieces. 


Page 367, problem 4: Example 16 on page 337, the cosmic girdle, showed that a rope that 
stretches over cosmological distances does expand significantly, unlike Brooklyn, nuclei, and 
solar systems. Since the Milne universe is nothing but a flat spacetime described in funny 
coordinates, something about that argument must fail. The argument used in that example 
relied on the use of a closed cosmology, but the Milne universe is not closed. This is not a 
completely satisfying resolution, however, because we expect that a rope in an open universe 
will also expand, except in the special case of the Milne universe. 


In a nontrivial open universe, every galaxy is accelerating relative to every other galaxy. 
By the equivalence principle, these accelerations can also be seen as gravitational fields, and 
tidal forces are what stretch the rope. In the special case of the Milne universe, there is no 
acceleration of test particles relative to other test particles, so the rope doesn’t stretch. 


Example 18 on page 340, the cosmic whip, resulted in the conclusion that the velocity of 
the rope-end passing by cannot be interpreted as a measure of the velocity of the distant galaxy 
to which the rope’s other end is hitched, which makes sense because cosmological solutions are 
nonstationary, so there is no uniquely defined notion of the relative velocity of distant objects. 
The Milne universe, however, is stationary, so such velocities are well defined. The key here is 
that nothing is accelerating, so the time delays in the propagation of information do not lead to 
ambiguities in extrapolating to a distant object’s velocity “now.” 


The Milne case also avoids the paradox in which we could imagine that if the rope is suf- 
ficiently long, its end would be moving at more than the speed of light. Although there is no 
limit to the length of a rope in the Milne universe (there being no tidal forces), the Hubble 
law cannot be extrapolated arbitrarily, since the expanding cloud of test particles has an edge, 
beyond which there is only vacuum. 


Page 367, problem 6: The cosmological constant is a scalar, so it doesn’t change under 
reflection. The metric is also invariant under reflection of any coordinate. This follows because 
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we have assumed that the coordinates are locally Lorentzian, so that the metric is diagonal. 
It can therefore be written as a line element in which the differentials are all squared. This 
establishes that the Aggy is invariant under any spatial or temporal reflection. 


The specialized form of the energy-momentum tensor diag(—p, P,P, P) is also clearly in- 
variant under any reflection, since both pressure and mass-energy density are scalars. 


The form of the tensor transformation law for a rank-2 tensor guarantees that the diagonal 
elements of such a tensor stay the same under a reflection. The off-diagonal elements will flip 
sign, but since only the G and T terms in the field equation have off-diagonal terms, the field 
equations remain valid under reflection. 


In summary, the Einstein field equations retain the same form under reflection in any co- 
ordinate. This important symmetry property, which is part of the Poincaré group in special 
relativity, is retained when we make the transition to general relativity. It’s a discrete sym- 
metry, so it wasn’t guaranteed to exist simply because of general covariance, which relates to 
continuous coordinate transformations. 


Page 367, problem 7: (a) The Einstein field equations are Gay = 877,» + Agap. That means 
that in a vacuum, where T = 0, a cosmological constant is equivalent to p = (1/87)A and 
P =—(1/87)A. This gives p+3P = (1/87)(—2A), which violates the SEC for A > 0, since part 
of the SEC is p+ 3P > 0. 

(a) Since our universe appears to have a positive cosmological constant, and the paper by 
Hawking and Ellis assumes the strong energy condition, doubts are raised about the conclusion 
of the paper as applied to our universe. However, the theorem is being applied to the early 
universe, which was not a vacuum. Both P and p were large and positive in the early, radiation- 
dominated universe, and therefore the SEC was not violated. 


Page 368, problem 8: 


(a) The Ricci tensor is Ry = g?e?9*, R., = —g?. The scalar curvature is 29”, which is 
constant, as expected. 


(b) Both Gy and G,, vanish by a straightforward computation. 


(c) The Einstein tensor is Gy = 0, Graz = Gyy = g°, Gzz = 0. It is unphysical because it has 
a zero mass-energy density, but a nonvanishing pressure. 


Page 368, problem 9: 


This proposal is an ingenious attempt to propose a concrete method for getting around the 
fact that in relativity, there is no unique way of defining the relative velocities of objects that 
lie at cosmological distances from one another. 


Because the Milne universe is a flat spacetime, there is nothing to prevent us from laying 
out a chain of arbitrary length. The chain will not, for example, be subject to the kind of tidal 
forces that would inevitably break a chain that was lowered through the event horizon of a 
black hole. But this only guarantees us that we can have a chain of a certain length as measured 
in the chain’s frame. An observer at rest with respect to the chain describes all the links of 
the chain as existing simultaneously at a certain set of locations. But this is a description in 
(7, R) coordinates. To an observer who prefers the FRW coordinates, the links do not exist 
simultaneously at these locations. This observer says that the supposed locations of distant 
points on the chain occurred far in the past, and suspects that the chain has broken since then. 


The paradox can also be resolved from the point of view of the (1, R) coordinates. The 
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chain is long enough that its end hangs out beyond the edges of the expanding cloud of galaxies. 
Since there are no galaxies beyond the edge, so there are no galaxies near the end of the chain 
with respect to which the chain could be moving at > c. 


Page 368, problem 10: Frames are local, not global. One of the things we have to specify in 
order to define a frame of reference is a state of motion. To define the volume of the observable 
universe, there end up being three spots in the definition at which we might need to pick a state 
of motion. I’ve labeled these 1-2-3 below. 


Observer O is in some state of motion [1] at event A. O’s past light-cone intersects the 
surface of last scattering (or some other surface where some other physically well-defined thing 
happens) in a spacelike two-surface S. S does not depend on O’s state of motion. At every 
event P on S, we define a state of motion [2] that is at rest relative to the Hubble flow, and 
we construct a world-line that starts out in this state of motion and extends forward in time 
inertially. One of these world-lines intersects O’s world-line at A. Let the proper time interval 
along this world-line be t. We extend all the other world-lines from all the other P by the same 
interval of proper time t. The end-points of all these world-lines constitute a spacelike 2-surface 
B that we can define as the boundary of the observable universe according to O. Let R be the 
3-surface contained inside B. In order to define R, we need to define some notion of simultaneity, 
which depends on one’s state of motion [3]. If we like, we can pick this state of motion to be 
one at rest with respect to the Hubble flow. Given this choice, we can define the volume V of 
R (e.g., by chopping R up into pieces and measuring those pieces using rulers that are in this 
state of motion). 


State of motion 1 had absolutely no effect on V, but states of motion 2 and 3 did. If O is 
not at rest relative to the Hubble flow at A, then 2 and 3 do not match O’s state of motion at 
A. This probably means that O will object that V is not the answer in his frame but in someone 
else’s. However, there is no clear way to satisfy O by modifying the above definition. We can’t 
just say that 2 and 3 should be chosen to be the same as O’s state of motion at A, because 
frames are local things, so matching them to O’s motion at A isn’t the same as matching them 
at points far from A. In a cosmological solution there is no well-defined notion of whether or 
not two cosmologically distant objects are at rest relative to one another. 


In particular, it is not meaningful to try to calculate a reduced value of V based on Lorentz 
contraction for O’s velocity relative to the Hubble flow. Lorentz contractions can’t be applied 
to a curved spacetime. 


Page 368, problem 11: The Friedmann equations reduce to 


Eliminating p, we find 


where 3 = (1+3w)/2. For a solution of the form a « t°, calculation of the derivatives results in 
6 =1/(1+86) = (2/3)/(1+w). For dust, 6 = 2/3, which checks out against the result on p. 347. 
For radiation, 6 = 1/2. For a cosmological constant, w = —1 gives 6 = oo, so the solution has a 
different form. 
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Page 368, problem 12: The integral is exactly the same as the one in example 22 on p. 348 
for the dust case, except that the exponent 2/3 is generalized to 6 = (2/3)/(1 + w), as shown 
in the solution to problem 11. The result is L/t = 1/(1— 6) = (w+ 1)/(w+ 1/3). In the 
radiation-dominated case, we have L/t = 2. 


Page 368, problem 13: The following Maxima code accomplishes the necessary calculations. 


/* Kantowski-Sachs spacetime */ 
load(ctensor) ; 
ct_coords: [t, theta, phi,z] ; 
lg:matrix([1,0,0,0], 
[0,-1/Lambda,0,0], 
[0,0,-(1/Lambda) *sin(theta)~2,0], 
[0,0,0,-exp(2*sqrt (Lambda) *t)])$ 
cmetric(); 
cgeodesic(true) ; 
leinstein(true) ; 
scurvature(); 


(a) The geodesic equations output by cgeodesic verify that a world-line of the given form is a 
geodesic. Direct application of the metric shows that A is the proper time. 

(b) This follows from the form of the spatial terms of the metric. 

(c) The lower-index Einstein tensor calculated by the code above equals A multiplied by the 
lower-index metric. 

(d) The Ricci scalar comes out as claimed. 

(e) Our earlier treatment was based on the assumptions of anisotropy and homogeneity. This 
spacetime is clearly anisotropic. (The result of part d suggests, as turns out to be the case, that 
it is homogeneous. ) 


Solutions for Chapter 9 


Page 384, problem 1: (a) The radiated power is on the order of (G/c’)(mr?)?w®. Taking 
the mass to be 10 tons, r = 10 m, we find that the frequency required is on the order of 10° 
revolutions per minute. 

(b) Using the same estimate for the radiated power as in part a, we get about 10~°? W. For the 
given excitation energy, this implies a rate of decay by gravitational wave emission of something 
like 10-7! s~!. In competition with a gamma decay having a rate on the order of 1 yr, this 
gives a probability of about 10~'4 for gravitational decay. This actually doesn’t sound so low 
that its detection would be impossible, but we would have to have a case where the extremely 
severe selection rule for gamma decay was not matched by an equally strong hindrance of the 
gravitational decay. 


Page 384, problem 2: (a) The members of the Hulse-Taylor system are spiraling toward one 
another as they lose energy to gravitational radiation. If one of them were replaced with a 
low-mass test particle, there would be negligible radiation, and the motion would no longer be 
a spiral. This is similar to the issues encountered on pp. 39ff because the neutron stars in the 
Hulse-Taylor system suffer a back-reaction from their own gravitational radiation. 


(b) If this occurred, then the particle’s world-line would be displaced in space relative to a 
geodesic of the spacetime that would have existed without the presence of the particle. What 
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would determine the direction of that displacement? It can’t be determined by properties of 
this preexisting, ambient spacetime, because the Riemann tensor is that spacetime’s only local, 
intrinsic, observable property. At a fixed point in spacetime, the Riemann tensor is even under 
spatial reflection, so there’s no way it can distinguish a certain direction in space from the 
opposite direction. 


What else could determine this mysterious displacement? By assumption, it’s not deter- 
mined by a preexisting, ambient electromagnetic field. If the particle had charge, the direction 
could be one imposed by the back-reaction from the electromagnetic radiation it had emitted 
in the past. If the particle had a lot of mass, then we could have something similar with gravi- 
tational radiation, or some other nonlinear interaction of the particle’s gravitational field with 
the ambient field. But these nonlinear or back-reaction effects are proportional to q? and m?, 
so they vanish when g = 0 and m = 0. 


The only remaining possibility is that the result violates the symmetry of space expressed by 
L1 on p. 51; the Lorentzian geometry is the result of L1-L5, so violating L1 should be considered 
a violation of Lorentz invariance. 
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none in Galilean spacetime, 101 


Michelson-Morley experiment, 69 
Milne universe, 332 
Minkowski, 41 
model 

mathematical, 93 
momentum four-vector, 126 
Mossbauer effect, 36 
muon, 16 


neighborhood, 195 
neutrino, 130 

neutron star, 145, 232 
no-cloning theorem, 215 
no-hair theorems, 282 
normal coordinates, 164 
null energy condition, 308 
null infinity, 283 


observable universe, 348 

size and age, 348 
open cosmology, 327 
open set, 195 
optical effects, 122 
orbit 

Killing vector, 261 
orientability, 152 
orientable 

in time, 224 
orthogonality, 125 


parallel postulate, 18 
parallel transport, 90, 91 
parity, 108 

Pasch, Moritz, 19 

patch, 199 

Penrose 


graphical notation for tensors, 47 


Penrose diagram, 271 
Penrose, Roger, 122, 242 


Penrose-Hawking singularity theorems, 316, 332 


Penzias, Arno, 323 
Petrov classification, 378 
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Petrov metric, 287, 290 
photon 
mass, 131 
Pioneer anomaly, 338 
Planck mass, 186 
Planck scale, 186 
Playfair’s axiom, 18 
Poincaré group, 108, 400 
polarization 
of gravitational waves, 378 
of light, 129 
potential, 32 
relativistic vs. Newtonian, 284 
Pound-Rebka experiment, 16, 34 
Poynting vector, 311 
principal group, 109 
prior geometry, 119 
projective geometry, 98 
proper distance, 326 
proper time, 123 
pulsar, 145, 232 


raising an idex, 105 

rank of a tensor, 102 

rapidity, 65 

red-shift 
cosmological 


kinematic versus gravitational, 285, 340 


gravitational, 16, 34 
Ricci curvature, 161 

defined, 170 
Ricci scalar, 236 
Riemann curvature tensor, 168 
Riemann tensor 

defined, 168 
rigid-body rotation, 110 
Rindler coordinates, 209 
ring laser, 73 
Robinson 

Abraham, 95 
Robinson, Abraham, 88 
rotating frame of reference, 109, 286 
rotation 

rigid, 110 


Sagittarius A*, 239, 282 
Sagnac effect, 112, 280 
defined, 73 
in GPS, 59 
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proportional to area, 84 
scalar 
defined, 46 
scalar curvature, 236 
Schwarzschild metric, 223 
in d dimensions, 252 
Schwarzschild, Karl, 217 
shielding 
gravitational, 315 
signature 
change of, 254 
defined as a list of signs, 218 
defined as an integer, 254 
singularity, 20, 241 
conical, 246 
coordinate, 236 
formal definition, 243 
naked, 248 
timelike, 247 
singularity theorems, 316 
Sirius B, 16 
spacelike, 63 
spaceship paradox, 65, 210 
special relativity 
defined, 30 
spherical geometry, 94 
spherical symmetry, 269 
spontaneous symmetry breaking, 348 
standard cosmological coordinates, 326 
static spacetime, 281 
stationary, 278 
asymptotically, 279 
stationary action, 62 
steady-state cosmology, 324, 363 
stress-energy tensor, 162, 295 
divergence-free, 294 
interpretation of, 298 
of an electromagnetic wave, 309 
symmetry of, 298 
string theory, 186 
strong energy condition, 308 
surface of last scattering, 323 
Susskind, Leonard, 252 
Sylvester’s law of inertia, 255 
symmetrization, 103 
symmetry 
spherical, 269 
symmetry breaking 


spontaneous, 348 
synchronization 
Einstein convention, 280 


tachyon, 158 
tangent space, see tangent vector 
tangent vector, 88, 201, 262 
Tarski, Alfred, 93 
Taub-NUT spacetimes, 197, 246 
Taylor, J.H., 232 
tensor, 102, 139 
antisymmetric, 103 
Penrose graphical notation, 47 
rank, 102, 203 
symmetric, 103 
transformation law, 139 
tensor density, 152 
tensor transformation laws, 138 
Terrell, James, 122 
Thomas precession, 72, 171, 225 
Thomas, Llewellyn, 82 
time dilation 
gravitational, 15, 33 
nonuniform field, 59 
kinematic, 15, 55 
time reversal, 108 
of the Schwarzschild metric, 223 
symmetry of general relativity, 223 
time-orientable, 224 
timelike, 63 
Tolman-Oppenheimer- Volkoff limit, 146 
topology, 194 
topology change, 198 
torsion, 181 
tensor, 184 
trace energy condition, 308 
transformation laws, 138 
transition map, 199 
transverse polarization 
of gravitational waves, 378 
of light, 129 
trapped surface, 317 
triangle inequality, 108 
Type III solution, 378 
Type N solution, 378 


Uhlenbeck, 81 


uniform gravitational field, 209, 285, 368 


unitarity, 215 


units, 202 
geometrized, 220 
universe 
observable, 348 
size and age, 348 
upsidasium, 26 


vector 
defined, 46 
dual, 47 
Penrose graphical notation, 47 
summarized, 413 
vectors and dual vectors, 101 
summarized, 413 
velocity addition, 65 
velocity four-vector, 124 
velocity vector, 124 
volume 
spacetime, 155 
volume expansion, 317 


Waage, Harold, 26 
wavenumber, 133 
waves 
gravitational, see gravitational waves 
weak energy condition, 308 
weight of a tensor density, 152 
Wheeler, John, 26 
white dwarf, 144 
Wilson, Robert, 323 
world-line, 21 
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Euclidean geometry (page 18): 


El Two points determine a line. 
E2 Line segments can be extended. 


E3 A unique circle can be constructed given any point as its center and any line segment as 
its radius. 


E4 All right angles are equal to one another. 


E5 Parallel postulate: Given a line and a point not on the line, exactly one line can be drawn 
through the point and parallel to the given line.!° 


Ordered geometry (page 19): 


Ol Two events determine a line. 


O2 Line segments can be extended: given A and B, there is at least one event such that [ABC] 
is true. 


O3 Lines don’t wrap around: if [ABC] is true, then [BCA] is false. 


O4 Betweenness: For any three distinct events A, B, and C lying on the same line, we can 
determine whether or not B is between A and C (and by statement 3, this ordering is 
unique except for a possible over-all reversal to form [CBA]). 


Affine geometry (page 43): 
In addition to 01-04, postulate the following axioms: 


Al Constructibility of parallelograms: Given any P, Q, and R, there exists S such that [PQRS], 
and if P, Q, and R are distinct then 5 is unique. 


A2 Symmetric treatment of the sides of a parallelogram: If [PQRS], then [QRSP], [QPSR], 
and [PRQS]. 


A3 Lines parallel to the same line are parallel to one another: If [ABCD] and [ABEF], then 
[CDEF]. 


Experimentally motivated statements about Lorentzian geometry (page 412): 


L1 Spacetime is homogeneous and isotropic. No point has special properties that make it 
distinguishable from other points, nor is one direction distinguishable from another. 


L2 Inertial frames of reference exist. These are frames in which particles move at constant 
velocity if not subject to any forces. We can construct such a frame by using a particular 
particle, which is not subject to any forces, as a reference point. 


This is a form known as Playfair’s axiom, rather than the version of the postulate originally given by Euclid. 
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L3 Equivalence of inertial frames: If a frame is in constant-velocity translational motion 
relative to an inertial frame, then it is also an inertial frame. No experiment can distinguish 
one inertial frame from another. 


L4 Causality: There exist events 1 and 2 such that t; < tg in all frames. 


L5 Relativity of time: There exist events 1 and 2 and frames of reference (t,x) and (t’, 2’) 
such that t < ta, but t) > th. 


Statements of the equivalence principle: 


Accelerations and gravitational fields are equivalent. There is no experiment that can 
distinguish one from the other (page 24). 


It is always possible to define a local Lorentz frame in a particular neighborhood of space- 
time (page 28). 


There is no way to associate a preferred tensor field with spacetime (page 142). 


Vectors 


Coordinates cannot in general be added on a manifold, so they don’t form a vector space, 
but infinitesimal coordinate differences can and do. The vector space in which the coordinate 
differences exist is a different space at every point, referred to as the tangent space at that point 
(see p. 262). 


Vectors are written in abstract index notation with upper indices, x*, and are represented 
by column vectors, arrows, or birdtracks with incoming arrows, > 2. 


Dual vectors, also known as covectors or 1-forms, are written in abstract index notation with 
lower indices, 7a, and are represented by row vectors, ordered pairs of parallel lines (see p. 48), 
or birdtracks with outgoing arrows, ¢ @. 


In concrete-index notation, the x are a list of numbers, referred to as the vector’s con- 
travariant components, while x, would be the covariant components of a dual vector. 


Fundamentally the distinction between the two types of vectors is defined by the tensor 
transformation laws, p. 138. For example, an odometer reading is contravariant because con- 
verting it from kilometers to meters increases it. A temperature gradient is covariant because 
converting it from degrees/km to degrees/m decreases it. 


In the absence of a metric, every physical quantity has a definite vector or dual vector 
character. Infinitesimal coordinate differences dx* and velocities dx°/dr are vectors, while 
momentum p, and force F,, are dual (see p. 141). Many ordinary and interesting real-world 
systems lack a metric (see p. 49). When a metric is present, we can raise and lower indices at 
will. There is a perfect duality symmetry between the two types of vectors, but this symmetry 
is broken by the convention that a measurement with a ruler is a Ar®, not a Aza. 


For consistency with the transformation laws, differentiation with respect to a quantity flips 
the index, e.g., 0, = 0/Ox". The operators 0, are often used as basis vectors for the tangent 
plane. In general, expressing vectors in a basis using the Einstein notation convention results 
in an ugly notational clash described on p. 265. 
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